[SERVER-17386] Cursor cache causes excessive memory utilization in WiredTiger Created: 26/Feb/15 Updated: 19/Sep/15 Resolved: 29/Jun/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | WiredTiger |
| Affects Version/s: | 3.0.0-rc10 |
| Fix Version/s: | 3.0.5, 3.1.5 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Nick Judson | Assignee: | Michael Cahill (Inactive) |
| Resolution: | Done | Votes: | 3 |
| Labels: | WTmem | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||||||||||
| Backport Completed: | |||||||||||||||||||||||||||||||||||||||||
| Sprint: | Quint Iteration 3.1.2, Quint Iteration 3, Quint Iteration 5 | ||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||
| Description |
| Comments |
| Comment by Githook User [ 29/Jun/15 ] | ||||||||
|
Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}Message: | ||||||||
| Comment by Githook User [ 29/Jun/15 ] | ||||||||
|
Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}Message: | ||||||||
| Comment by Martin Bligh [ 17/Jun/15 ] | ||||||||
|
Wondering if it's worth benchmarking together with the other intended change (unless you think it's not a good idea): | ||||||||
| Comment by Michael Cahill (Inactive) [ 17/Jun/15 ] | ||||||||
|
There could be a bunch of things involved in the 128 thread case, not least of which is that we're running the client on the same machine as the server and they're competing for the same CPU. We've been doing more runs today, including the mms.js script that david.hows and the hammer single_query.sh workload. They are showing a consistent drop in the number of open sessions (from 148 to 28 with hammer). There is still some tweaking of the cursor cache required: if I set it too small (e.g., 10 cursors), then there is approx 10% slowdown in the hammer workload as cursors are closed and reopened. If I make the cursor cache big enough to hold all 48 cursors that operations use, then there is no performance impact of the change (in fact it seems marginally faster). Even with that larger cache, there is still a reduction in the total number of open cursors from 7412 to 1095. I'll get this pushed into code review tomorrow: david.hows is still chasing a bug that I can't reproduce. | ||||||||
| Comment by David Hows [ 17/Jun/15 ] | ||||||||
|
Should also add that not all is bad, there is a massive drop in resource consumption in the 128 thread case mentioned above. The session count is about accurate with what we would expect for 128 threads, the 10 leftover are those outside the cache created by various internal components on boot; 10x less cursors and 2-3x less sessions Vanilla MongoDB
Patched
| ||||||||
| Comment by Martin Bligh [ 16/Jun/15 ] | ||||||||
|
michael.cahill, david.hows - sounds good. Per discussion yesterday (including Alex) I think we expect some minor perf regressions in exchange for stability, so shouldn't be a blocking problem, but probably worth poking at them a bit to check causes etc. (eg in the 1 thread case above?) Per David's last comment, is the 128-thread case the sched_yield thing? ISTR --nojournal also making a large difference at high thread counts for hammer insert | ||||||||
| Comment by David Hows [ 16/Jun/15 ] | ||||||||
|
I've got some initial changes put together for #2 and #4 of Martin's list, but still need to do a little more work to confirm if there is a performance impact and if so, look at how to avoid a performance regression. The current benchmarks i've done show the following (num threads vs execution time) so I want to investigate the high thread number cases as those seem to be where things look like a regression. Patched
Vanilla
| ||||||||
| Comment by Michael Cahill (Inactive) [ 16/Jun/15 ] | ||||||||
|
martin.bligh, I'm planning to work with david.hows on 2 and 4. He's got preliminary code with a non-locking implementation of a single-slot session cache. We'll report back with the impact of these changes on memory use and performance for some set of workloads. Will that get in your way? | ||||||||
| Comment by Martin Bligh [ 15/Jun/15 ] | ||||||||
|
Intent for 3.0 after discussion between WT and Storage teams. 1. Limit the number of slots on WiredTigerSessionCache to less than 64 slots (4 or 8). May be a slight perf impact, but order of magnitude less memory consumption. All numbers are subject to change post-benchmarking Other thoughts for further investigation include: A. Remove the WiredTigerSession cache by tying the WiredTigerSession directly to the Mongo Session | ||||||||
| Comment by Daniel Pasette (Inactive) [ 20/Mar/15 ] | ||||||||
|
mdcallag, only linux uses tcmalloc in 3.0. Previously osx also used tcmalloc. | ||||||||
| Comment by Mark Callaghan [ 16/Mar/15 ] | ||||||||
|
Just curious, does this mean mongod doesn't using tcmalloc or jemalloc for Windows? | ||||||||
| Comment by Mark Benvenuto [ 01/Mar/15 ] | ||||||||
|
The repro consists of about 100 connections with a mix of inserts, and updates across 150 collections in a single database. There are a few issues. The "memory leak" occurs because of a session & cursor "leak". If we fix the session/cursor leak, there may be other issues though. We can see the leak from the WT counters. These are the current open sessions, and open cursors counters. These are not the sessions/cursors currently in use in a transaction (that count would be lower), just the ones that are open. Note, these counters are accurate, there appear to be no bugs in the counters themselves. I instrumented the code to confirm the session count for instance.
Results of Investigations
Heap Contention WiredTigerSessionCache
The cache is divided into 64 chunks. This one line ensures that a thread will never hit the same session cache entry twice. It also means that if a customer had a single threaded application, we would go through 64 WT_SESSIONS before a thread ever actually reuses a single session since there is no cpu or thread locality into the cache. The cachePartitionGen.addAndFetch(1) might as well be a call to random. Also, in this repro, the number of open sessions has grown to 446 yet there are only 100 connections. This also shows how the round robin between buckets has caused the session count to greatly exceed the number of connections. A better approach would be GetCurrentProcessorNumber/sched_cpu. See this page for other alternatives. WiredTigerSession Cursor Cache Ideally, a MRU data structure would be used instead like those available in Boost Multi Index | ||||||||
| Comment by Daniel Pasette (Inactive) [ 27/Feb/15 ] | ||||||||
|
mongod should not generally be using that much extra memory over configured cache size. If you are able to modify the test program from Thanks again for the report. | ||||||||
| Comment by Nick Judson [ 26/Feb/15 ] | ||||||||
|
And one more piece of information - the stack traces always seem to happen at shutdown, and seem to happen twice. The last of the two appears immediately before the ^C is logged (in my log files). | ||||||||
| Comment by Nick Judson [ 26/Feb/15 ] | ||||||||
|
So mongod must have some leeway with the --wiredTigerCacheSizeGB flag. As a test I set it to 2GB and it's now at 2.9 GB [and now up to 4GB] in both mongostat and task manager. Not sure if this is expected or perhaps a slow memory leak? | ||||||||
| Comment by Nick Judson [ 26/Feb/15 ] | ||||||||
|
I have been able to repro this. Here are some details: -wiredTiger rc10 I'm using the following command line: mongod --dbpath=e:\mongo --storageEngine wiredTiger --wiredTigerCacheSizeGB=6 --logpath-c:\mongo\mongo.log It appears that mongod is using > 6GB (see screenshot), and so system paging kicks in when it hits ~9GB. This, obviously, slows things down quite a bit, I have a 110MB log file, which shows exactly the same as the existing screenshot. | ||||||||
| Comment by Nick Judson [ 26/Feb/15 ] | ||||||||
|
I don't think I have any logs - this was run in the console window without a log target argument. Would MMS have anything useful? I will attempt to repro when I get a minute (with logging to a file). I can't share the code unfortunately. I have a sample which is attached to | ||||||||
| Comment by Ramon Fernandez Marina [ 26/Feb/15 ] | ||||||||
|
nickj, can you please upload full server logs in text? Also, is it possible for you to share the code you're using to trigger these messages so we can try to reproduce them on our end? Thanks, |