[SERVER-19795] mongod memory consumption higher than WT cache size (at least on Windows 2008 R2) Created: 06/Aug/15 Updated: 11/Jan/18 Resolved: 26/Sep/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | WiredTiger |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Marc Girollet | Assignee: | Ramon Fernandez Marina |
| Resolution: | Done | Votes: | 0 |
| Labels: | RF, WTmem | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
W 2008 R2 |
||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Operating System: | Windows | ||||||||||||
| Participants: | |||||||||||||
| Case: | (copied to CRM) | ||||||||||||
| Description |
|
Although mongod.exe process memory is well capped ,as required by the --wiredTigerCacheSizeGB, the OS (Windows 2008 R2) keeps huge part of os memory cache active on files. See screen shot of RMAP tool :
the "--wiredTigerEngineConfigString direct_io=[data]" is a workaround to this problem , but then it lead to too much slowness for queries and in fact is not applicable for our volumes of process and data. Could you make --wiredTigerCacheSizeGB take in acount the whole ram used because of mongo (os+process) , please ? Mongo64_2008+\mongod.exe" --port 4444 --dbpath D:\Homeware\XOne_services\PreBETA\MongoData\CurvesFR --directoryperdb --journal --nohttpinterface --wiredTigerCacheSizeGB 1 --wiredTigerDirectoryForIndexes --replSet MongoServiceCacheCurves --oplogSize 1024 --storageEngine wiredTiger --auth --keyFile x.keyfile |
| Comments |
| Comment by Nick Judson [ 19/Oct/15 ] | ||||||||||||
|
@Michael Cahill - I found this an interesting read: http://winntfs.com/2012/11/29/windows-write-caching-part-2-an-overview-for-application-developers/ ..."Some well known applications such as Microsoft SQL and the Microsoft JET database (ships with Windows Server SKUs) specify FILE_FLAG_NO_BUFFERING with the CreateFile API."... Is there a flag where we can get verbose output from WT (and track the frequency of flushes etc.)? | ||||||||||||
| Comment by Nick Judson [ 07/Oct/15 ] | ||||||||||||
|
Thanks for looking @Michael Cahill - appreciated. I don't know if/what the fix might be and unfortunately I don't have a C++ build environment. Looking at the windows docs and experimenting with artificially restricted system caches doesn't reveal much of interest - other than for my workload it doesn't appear to provide any benefit. I don't see how it's possible to restrict the system file cache by process, so it may be out of your hands. Any in-house Windows experts able to comment? The part I find odd is that the cache never seems to be released. I'm wondering if there is something non-standard about the way WT creates/locks files. | ||||||||||||
| Comment by Michael Cahill (Inactive) [ 06/Oct/15 ] | ||||||||||||
|
nick@innsenroute.com, the os_cache_max setting in WiredTiger currently relies on posix_fadvise: we don't have a Windows implementation so it will not have any effect. I'm genuinely sorry that you are having problems using WiredTiger on Windows. It is a good sign that enabling direct I/O helps in some cases: what I suspect that relies on is the I/O subsystem being fast enough that reads / writes don't stall by going to disk synchronously. I did a review today of what interfaces are available on Windows and how WiredTiger uses them. One thing I noticed is that WiredTiger's direct_io setting does two things on Windows: it sets FILE_FLAG_NO_BUFFERING to disable filesystem cache and FILE_FLAG_WRITE_THROUGH to make writes synchronous to disk. I don't think WiredTiger needs FILE_FLAG_WRITE_THROUGH for correctness: we will also call FlushFileBuffers for durability. Given that, it may be worth trying performance runs without writethrough semantics. Unfortunately, doing that requires a source code change:
If anyone is prepared to give this a try, please let me know. | ||||||||||||
| Comment by Nick Judson [ 02/Oct/15 ] | ||||||||||||
|
I'm wondering about the os_max_cache setting in WT. If that's settable then it would be interesting to try. | ||||||||||||
| Comment by Stephen JANNIN [ 02/Oct/15 ] | ||||||||||||
|
During our activation of WiredTiger in production in July, we tried to use direct_io during few days, but we had many slow requests. Memory was not leaked anymore, but performance was catastrophic. | ||||||||||||
| Comment by Nick Judson [ 02/Oct/15 ] | ||||||||||||
|
Retest with Direct IO off took 7h:32m vs, 5h:48m for a ~30% speed difference. Direct IO also uses 6GB less physical RAM. | ||||||||||||
| Comment by Nick Judson [ 01/Oct/15 ] | ||||||||||||
|
Ramon, there seem to be a few issues at play here: 1. MongoDB memory usage > WT Cache size due to TCMalloc cache going over 1GB. ( To re-iterate, for a pleb such as myself, it seems odd that MongoDb soaks up all the system memory, when other databases I've used do not. Surely this cannot be the expected behavior (this ticket). If you want me to create a ticket for 2B I will, but it seems strongly correlated to this ticket. | ||||||||||||
| Comment by Ramon Fernandez Marina [ 01/Oct/15 ] | ||||||||||||
|
nick@innsenroute.com, will you please open a separate ticket and post your results in it? Whether better memory management is needed on Windows, or whether it makes sense to make directIO the default on this platform, these are broader and different topics than the behavior/bug described in this ticket (using the --wiredTigerCacheSizeGB to limit mongod's memory consumption). Thanks, | ||||||||||||
| Comment by Nick Judson [ 01/Oct/15 ] | ||||||||||||
|
ok - so I ran a test last night with my standard workload on 3.1.8 with WT cache size set to 4 GB and direct IO turned on. Fastest run I have ever seen - finishing in 5h:48m - fantastic, consistent performance (see attached). Lots of system memory left idle. I'm currently running the same test without direct IO. Initially, the speeds are faster, but an hour or so into the test the speeds have dropped by 25%. The OS has paged over 6GB of WT files into memory and is bumping up against the physical memory limit. The direct IO off test is still running, but when it finishes I'll upload the perf chart for that. | ||||||||||||
| Comment by Nick Judson [ 29/Sep/15 ] | ||||||||||||
|
I also tested mongod releasing memory back to the OS, and I can confirm that Edit: it took a while but it did release half of the process memory, but none of the file cache memory. Killed MongoD and the WT file cache was released, and SQL Server climbed back up to 1GB (see screen shot). | ||||||||||||
| Comment by Nick Judson [ 29/Sep/15 ] | ||||||||||||
|
A few notes for a baseline: I configured sql server with a 1GB cap and ran my test. See the SqlServer attachment which shows the overall memory usage is 5.2GB, with almost nothing cached by the OS and 1GB physical RAM in use by the SQL process. I ran the same test with 3.1.8 with WT cache set to 1GB (see Mongo_318 attachment). It shows the overall usage at 7GB, with 2.5 GB OS cache and MongoD using 1.3 GB. SQL = (System usage) + 1GB = 5.2 GB. It's difficult to illustrate but in the SQL scenario there is less memory pressure and other processes are consuming more memory. In the Mongo scenario, many of those same processes are using less memory. I suspect if that were taken into account, the system memory would be closer to 8GB for MongoD. So on this short test, MongoDb is consuming 3.8 GB of RAM with a WT cache size set to 1GB. It uses 1.8 to 2.8 GB more memory than sql server restricted to 1GB. I will re-test on my work machine which is much beefier - but from my earlier results it appears the OS cache is 5GB in that scenario. From my notes it appears on 3.1.7 with WT cache set to 3GB, the actual usage is 10 GB! | ||||||||||||
| Comment by Nick Judson [ 28/Sep/15 ] | ||||||||||||
|
And one totally unrelated comment: "...Note that setting a large value for the WiredTiger cache to improve performance..." isn't necessarily true. I was surprised to learn that lowering the WT cache size for insert-heavy workloads had a surprisingly large impact on performance (improvement that is). | ||||||||||||
| Comment by Nick Judson [ 28/Sep/15 ] | ||||||||||||
|
A few comments: 1. Use --wiredTigerCacheSizeGB to limit the wiredtiger cache. This works. As previously mentioned, I will attempt to run some further analysis and perhaps compare this behavior with Sql Server. Is this something @Mark Benvenuto could comment on? | ||||||||||||
| Comment by Stephen JANNIN [ 28/Sep/15 ] | ||||||||||||
|
I disagree too. We cannot use WiredTiger in these conditions. | ||||||||||||
| Comment by Nick Judson [ 26/Sep/15 ] | ||||||||||||
|
Ramon, I disagree. I'll perform further testing but I don't think this makes sense. | ||||||||||||
| Comment by Ramon Fernandez Marina [ 26/Sep/15 ] | ||||||||||||
|
marc.girollet@sgcib.com, if my understanding of the issue description above is correct this is expected behavior: the --wiredTigerCacheSizeGB only limits the WiredTiger cache, not the amount of memory consumed by mongod. There's currently no global limit one can set to limit the amount of memory used by mongod. Also the OS may use additional memory for buffering, which may be released to other processes if there's memory pressure. Note that setting a large value for the WiredTiger cache to improve performance will reduce the amount of memory the OS may use for buffering, which may have a negative effect on performance. Regards, | ||||||||||||
| Comment by Nick Judson [ 25/Aug/15 ] | ||||||||||||
|
[ShutDownMongoD.png] This shows the memory drop when mongod is shut down (~10G). Task manager details tab shows the working set to be ~5G. | ||||||||||||
| Comment by Eitan Klein [ 25/Aug/15 ] | ||||||||||||
|
Attached file that highlight the visibility of this issue for windows users. | ||||||||||||
| Comment by Nick Judson [ 25/Aug/15 ] | ||||||||||||
|
Mods, please link this to https://jira.mongodb.org/browse/WT-1990?jql=text%20~%20%22Cache%20windows%22 which appears to be the original ticket. | ||||||||||||
| Comment by Nick Judson [ 25/Aug/15 ] | ||||||||||||
|
I also see this behavior on my systems. Even though in task manager Mongo will appear to use 5GB (even with CacheSize set to 3GB, but that is a different issue), the actual memory usage will be close to double that. RamMap doesn't yet support windows 10 but it's easy to see that the overall system memory usage drops by close to double what is reporting by the task manager details tab when MongoD is shut down. I don't think this is the correct behavior and, for example, a capped Sql Server instance does not appear to use the OS caching in this manner (ie, a cap of 4GB = 4GB physical memory usage). I'm working with Eitan on some other related issues but I've mentioned this to him. | ||||||||||||
| Comment by Stephen JANNIN [ 07/Aug/15 ] | ||||||||||||
|
Comparing wiredTiger source Code and mmapv1 source code, I have a few remarks : 1/ difference in the way we create and use memory mapped files.
2/ wiredTiger : you never call FlushViewOfFile nor FlushFileBuffers, where I think something happens in mmapv1. Maybe in data_file_sync.cpp, where we call flushAllFiles in a loop in a thread. | ||||||||||||
| Comment by Marc Girollet [ 07/Aug/15 ] | ||||||||||||
|
Hi Michael, A well stable case is leaking 1,172,768K (previous screen shot) ,updates (100 per min) queries (150 per min) Attached , activity Graph , server status -------------------- We managed to restrict the OS cache use (see http://www.uwe-sieber.de/ntcacheset_e.html ) we used SetSystemFileCacheSize. The point is that we didn't have to do this with mmap engine | ||||||||||||
| Comment by Michael Cahill (Inactive) [ 06/Aug/15 ] | ||||||||||||
|
marc.girollet@sgcib.com, I have moved this ticket to the SERVER project, which deals with MongoDB issues. Note that the WiredTiger cache is not designed to include all sources of memory allocated by mongod. WiredTiger itself allocates memory for various purposes other than the page cache, such as for buffering log records and caching keys and values in cursors. To make progress with the pattern of memory use you are seeing, please include more information about the version of MongoDB you are running and the workload that is causing this behavior. The direct I/O configuration is included to avoid having the operating system cache filesystem buffers, but that is intended for performance tuning rather than to avoid out-of-memory issues. |