PDC09 – Pre-Conference – Windows Bootcamp Part 3/6 - Working Set Background

Working Set Background

  • Optimal usage of system memory – a constant area of investment
  • Working set: Comprises all the potentially trimmable virtual addresses for a given process, session or system resource
  • Resources like nonpaged pool, kernel stacks, large pages & AWE regions are excluded
  • Working sets provide an efficient way for the system to make memory available under pressure … but maintaining them is nor free

Working Set Aging/Trimming

  • Are periodically aged to improve trim decisions
  • Which sets and which virtual addresses to trim?
  • How much to trim?
  • Memory events so applications can trim?

General Policies of Working Sets

  • How is optimal usage achieved?
    • Ordered based on their age distribution
    • Trim goal is set higher to avoid subsequent additional trimming
    • After the goal is meet other sets continue to be trimmed – bus just for their very old pages. This provides fairness so one process doesn’t surrender pages and the others not.
    • Up to 4 pages may be performed later passes consider higher percentages of each working set and lower ages (more recently accessed) as well
    • When trimming does occur, all sets are also aged so future trims have optimal aging

Working Set improvements

  • Expansion to 8 aging values
  • Keep exact age distribution counts instead of estimates
  • Force self-aging and trimming during rapid expansion
  • Don’t skip processes due to lock contention and ensure fair aging by removing pass limits
  • Separation of the system cache working set into 3 distinct working sets (system cache, paged pool and driver images)
    • Now we can apply minimums to each one making it more manageable and interesting
  • Factor in standby list repurposing when making age/trim decisions
    • Any pages that are reference by your application are immediately protected by the factor in the standby list
  • Improved inpage clustering of system addresses
  • RESULTS: Doubling of performance in memory constrained systems 

PFN Lock Background

  • The Page Frame Number (PFN) array is virtually continuous (but can be physically sparse)
  • The Problem
    • The huge majority of virtual memory operations were synchronized via a single system-wide PFN lock.
    • Large number of processors and memory sizes identify the lock pressure. Prior to this change SQL Server had an 88% PFN lock contention rate on systems with 128 processors
    • Applications and device drivers seeking higher performance faced significance complexity at best
  • The Answer
    • In Windows 7 the system wide PFN lock was replaced with the fine-grained locking on an individual page basis
    • This completely eliminated the bottleneck, resulting in much higher scalability.

No Comments