[SERVER-74038] [Windows] Possible negative performance effects of SetProcessWorkingSetSize in SecureAllocator Created: 15/Feb/23  Updated: 29/Oct/23  Resolved: 27/Feb/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.0.0-rc0, 4.4.20, 5.0.16, 6.0.6

Type: Bug Priority: Major - P3
Reporter: Louis Williams Assignee: Mark Benvenuto
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File windows_bad_memory_trim.png     PNG File windows_good_memory_trim.png    
Issue Links:
Backports
Related
related to SERVER-23705 Number of databases on Windows is lim... Closed
is related to SERVER-74289 Add statistics for secure allocator t... Closed
Assigned Teams:
Server Security
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v6.0, v5.0, v4.4
Sprint: Security 2023-03-06
Participants:

 Description   

Our SecureAllocator attempts to lock memory pages so that the OS does not page them to disk. In Linux we use mlock, and in Windows we use VirtualLock.

Windows imposes limits on the number of pages that can be locked by VirtualLock. From the documentation:

Each version of Windows has a limit on the maximum number of pages a process can lock. This limit is intentionally small to avoid severe performance degradation. Applications that need to lock larger numbers of pages must first call the SetProcessWorkingSetSize function to increase their minimum and maximum working set sizes.

In SERVER-23705, we started using the SetProcessWorkingSetSize API when we reach these limits.

Even when the pagefile is disabled, we have observed strange performance behavior where Windows moves essentially all of the mongod's memory from the Active state to the Inactive state. This is accompanied by very long stalls. In FTDC, the observed effect is that there is no mongod resident memory.

My theory is that SetProcessWorkingSetSize tells the OS that certain memory is important, but by implication that everything outside of the working set is not actually important. As a result of setting the working set to a small value in order to lock pages, the also OS decides to mark the remaining resident memory as "Inactive" even if there is plenty of free memory available in the system. This has serious performance implications that we should investigate and understand.

It seems like we either need to disprove this theory, find a different API, or use always use SetProcessWorkingSetSize to ensure that the entire process's memory is important.



 Comments   
Comment by Githook User [ 20/Mar/23 ]

Author:

{'name': 'Mark Benvenuto', 'email': 'mark.benvenuto@mongodb.com', 'username': 'markbenvenuto'}

Message: SERVER-74038 Grow dwMaximumWorkingSetSize with current working set size

(cherry picked from commit db5ca2947f37d6706c01fe24d6294af75b6418c9)
Branch: v6.0
https://github.com/mongodb/mongo/commit/34de12fdfd56a07d1c1d6a1c194de6e6e906d1b0

Comment by Githook User [ 20/Mar/23 ]

Author:

{'name': 'Mark Benvenuto', 'email': 'mark.benvenuto@mongodb.com', 'username': 'markbenvenuto'}

Message: SERVER-74038 Grow dwMaximumWorkingSetSize with current working set size

(cherry picked from commit db5ca2947f37d6706c01fe24d6294af75b6418c9)
Branch: v5.0
https://github.com/mongodb/mongo/commit/e41dfbe8bf58bc8489defe42dfb72a0512f5c2fe

Comment by Githook User [ 20/Mar/23 ]

Author:

{'name': 'Mark Benvenuto', 'email': 'mark.benvenuto@mongodb.com', 'username': 'markbenvenuto'}

Message: SERVER-74038 Grow dwMaximumWorkingSetSize with current working set size

(cherry picked from commit db5ca2947f37d6706c01fe24d6294af75b6418c9)
Branch: v4.4
https://github.com/mongodb/mongo/commit/34023f77e285f99dfd92388a05c142855d6bbd4c

Comment by Githook User [ 27/Feb/23 ]

Author:

{'name': 'Mark Benvenuto', 'email': 'mark.benvenuto@mongodb.com', 'username': 'markbenvenuto'}

Message: SERVER-74038 Grow dwMaximumWorkingSetSize with current working set size
Branch: master
https://github.com/mongodb/mongo/commit/db5ca2947f37d6706c01fe24d6294af75b6418c9

Comment by Mark Benvenuto [ 22/Feb/23 ]

So the issue is that fundamentally that a pair of calls GetProcessWorkingSetSize/SetProcessWorkingSetSize does not do what originally expected. The original fix (SERVER-23705) addressed the problem of exhausting the process quota for locked pages but did not appreciate the side effects of dwMaximumWorkingSetSize in SetProcessWorkingSetSize. Windows memory manager treats dwMaximumWorkingSetSize as a target working set size for a process to achieve. This means that Windows tries to forcibly eject pages until it achieves its target size. The dwMaximumWorkingSetSize that is returned by GetProcessWorkingSetSize is not the current working set size of the process but the original process max working set size if the system is under memory pressure, default of 345 pages.

This issue can be easily demonstrated by a custom repro using a unit test (See https://github.com/mongodb/mongo/compare/master...markbenvenuto:mongo:secure_allocator_measure?expand=1). Said repro simply tries to gobble up memory with 1% of it being "secure" memory. By observing memory consumption from GetProcessWorkingSetSize and GetProcessMemoryInfo, we can see how the working set size oscillates as our test program consumes more memory and Windows is periodically told to empty the working set.

Current Behavior

Fixed Behavior

Generated at Thu Feb 08 06:26:20 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.