PDC09 – Pre-Conference – Windows Bootcamp Part 3/6 - Working Set Background
Working Set Background
- Optimal usage of system memory – a constant area of investment
- Working set: Comprises all the potentially trimmable virtual addresses for a given process, session or system resource
- Resources like nonpaged pool, kernel stacks, large pages & AWE regions are excluded
- Working sets provide an efficient way for the system to make memory available under pressure … but maintaining them is nor free
Working Set Aging/Trimming
- Are periodically aged to improve trim decisions
- Which sets and which virtual addresses to trim?
- How much to trim?
- Memory events so applications can trim?
General Policies of Working Sets
- How is optimal usage achieved?
- Ordered based on their age distribution
- Trim goal is set higher to avoid subsequent additional trimming
- After the goal is meet other sets continue to be trimmed – bus just for their very old pages. This provides fairness so one process doesn’t surrender pages and the others not.
- Up to 4 pages may be performed later passes consider higher percentages of each working set and lower ages (more recently accessed) as well
- When trimming does occur, all sets are also aged so future trims have optimal aging
Working Set improvements
- Expansion to 8 aging values
- Keep exact age distribution counts instead of estimates
- Force self-aging and trimming during rapid expansion
- Don’t skip processes due to lock contention and ensure fair aging by removing pass limits
- Separation of the system cache working set into 3 distinct working sets (system cache, paged pool and driver images)
- Now we can apply minimums to each one making it more manageable and interesting
- Factor in standby list repurposing when making age/trim decisions
- Any pages that are reference by your application are immediately protected by the factor in the standby list
- Improved inpage clustering of system addresses
- RESULTS: Doubling of performance in memory constrained systems
PFN Lock Background
- The Page Frame Number (PFN) array is virtually continuous (but can be physically sparse)
- The Problem
- The huge majority of virtual memory operations were synchronized via a single system-wide PFN lock.
- Large number of processors and memory sizes identify the lock pressure. Prior to this change SQL Server had an 88% PFN lock contention rate on systems with 128 processors
- Applications and device drivers seeking higher performance faced significance complexity at best
- The Answer
- In Windows 7 the system wide PFN lock was replaced with the fine-grained locking on an individual page basis
- This completely eliminated the bottleneck, resulting in much higher scalability.