[SERVER-17386] Cursor cache causes excessive memory utilization in WiredTiger Created: 26/Feb/15  Updated: 19/Sep/15  Resolved: 29/Jun/15

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.0.0-rc10
Fix Version/s: 3.0.5, 3.1.5

Type: Bug Priority: Major - P3
Reporter: Nick Judson Assignee: Michael Cahill (Inactive)
Resolution: Done Votes: 3
Labels: WTmem
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File mongo.png     PNG File mongoMem.png    
Issue Links:
Depends
Duplicate
is duplicated by SERVER-17456 Mongodb 3.0 wiredTiger storage engine... Closed
is duplicated by SERVER-18821 64 sessions created in single threade... Closed
is duplicated by SERVER-19273 Short range scan using YCSB cause mon... Closed
is duplicated by SERVER-19562 Restore error: insertion error with o... Closed
Related
related to SERVER-17364 Idle cursors shouldn't hold storage e... Closed
is related to NODE-487 WiredTiger memory leak Closed
is related to SERVER-17424 WiredTiger uses substantially more me... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Completed:
Sprint: Quint Iteration 3.1.2, Quint Iteration 3, Quint Iteration 5
Participants:

 Description   
Issue Status as of Jul 14, 2015

ISSUE SUMMARY
WiredTiger allows MongoDB to open an unbounded number of WiredTiger sessions, which means that despite a limited number of WiredTiger cursors per session, the total number of cached sessions an cursors may lead to excessive memory use.

USER IMPACT
MongoDB with WiredTiger may consume too much memory in some situations, leading to excessive memory use beyond the normal WiredTiger database cache, potentially triggering out of memory conditions and even the OOM killer.

WORKAROUNDS
None.

AFFECTED VERSIONS
MongoDB 3.0.0 through 3.0.4.

FIX VERSION
The fix is included in the 3.0.5 production release.

Original description

Running RC10 with the c# driver, 50 threads concurrently bulk inserting/updating/reading, I see stacktraces in the console output followed by a big speed drop. The speed drop appears to be related to reads, but I can't be sure yet.

I will attempt to reproduce this with a long running test and report back.



 Comments   
Comment by Githook User [ 29/Jun/15 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

Message: SERVER-17386 WiredTiger session cache improvements
Branch: master
https://github.com/mongodb/mongo/commit/91c9eaeae7dbe3ff5e5e5b369871562beda908a8

Comment by Githook User [ 29/Jun/15 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

Message: SERVER-17386 WiredTiger session cache improvements
Branch: v3.0
https://github.com/mongodb/mongo/commit/49f1cf81a2aa87792e000f73816015f3fd16b4a6

Comment by Martin Bligh [ 17/Jun/15 ]

Wondering if it's worth benchmarking together with the other intended change (unless you think it's not a good idea):
"Limit the number of slots on WiredTigerSessionCache to less than 64 slots (4 or 8)."
if you're not doing this already? Else we will have to run them all again in a few days
Seems like a tradeoff between cache affinity and contention ... I'd think we want it as small as possible without causing a lot of contention (on most loads)

Comment by Michael Cahill (Inactive) [ 17/Jun/15 ]

There could be a bunch of things involved in the 128 thread case, not least of which is that we're running the client on the same machine as the server and they're competing for the same CPU.

We've been doing more runs today, including the mms.js script that david.hows and the hammer single_query.sh workload. They are showing a consistent drop in the number of open sessions (from 148 to 28 with hammer).

There is still some tweaking of the cursor cache required: if I set it too small (e.g., 10 cursors), then there is approx 10% slowdown in the hammer workload as cursors are closed and reopened. If I make the cursor cache big enough to hold all 48 cursors that operations use, then there is no performance impact of the change (in fact it seems marginally faster). Even with that larger cache, there is still a reduction in the total number of open cursors from 7412 to 1095.

I'll get this pushed into code review tomorrow: david.hows is still chasing a bug that I can't reproduce.

Comment by David Hows [ 17/Jun/15 ]

Should also add that not all is bad, there is a massive drop in resource consumption in the 128 thread case mentioned above. The session count is about accurate with what we would expect for 128 threads, the 10 leftover are those outside the cache created by various internal components on boot; 10x less cursors and 2-3x less sessions

Vanilla MongoDB

	"session" : {
		"open cursor count" : 12398,
		"open session count" : 366
	},

Patched

	"session" : {
		"open cursor count" : 1398,
		"open session count" : 138
	},

Comment by Martin Bligh [ 16/Jun/15 ]

michael.cahill, david.hows - sounds good. Per discussion yesterday (including Alex) I think we expect some minor perf regressions in exchange for stability, so shouldn't be a blocking problem, but probably worth poking at them a bit to check causes etc. (eg in the 1 thread case above?)

Per David's last comment, is the 128-thread case the sched_yield thing? ISTR --nojournal also making a large difference at high thread counts for hammer insert

Comment by David Hows [ 16/Jun/15 ]

I've got some initial changes put together for #2 and #4 of Martin's list, but still need to do a little more work to confirm if there is a performance impact and if so, look at how to avoid a performance regression.

The current benchmarks i've done show the following (num threads vs execution time) so I want to investigate the high thread number cases as those seem to be where things look like a regression.

Patched

1	14362.742176929787
4	44660.264345779105
16	45461.05404115101
128	29478.12966445289

Vanilla

1	16118.435705814753
4	46082.747979357846
16	45675.32389163103
128	28135.928187454814

Comment by Michael Cahill (Inactive) [ 16/Jun/15 ]

martin.bligh, I'm planning to work with david.hows on 2 and 4. He's got preliminary code with a non-locking implementation of a single-slot session cache.

We'll report back with the impact of these changes on memory use and performance for some set of workloads. Will that get in your way?

Comment by Martin Bligh [ 15/Jun/15 ]

Intent for 3.0 after discussion between WT and Storage teams.

1. Limit the number of slots on WiredTigerSessionCache to less than 64 slots (4 or 8). May be a slight perf impact, but order of magnitude less memory consumption.
2. Bound number of WT cursors bound per session to 16 on release. (zero may also be a possibility)
3. Idle client cursors should not hold storage engine (or WT) resources.
4. Investigate picking more efficiently which slot to use in WiredTigerSessionCache (David has a patch which seems to have no downside ...)

All numbers are subject to change post-benchmarking

Other thoughts for further investigation include:

A. Remove the WiredTigerSession cache by tying the WiredTigerSession directly to the Mongo Session
B. Making the WiredTigerSessionCache a single "slot" again and changing the data structure to something more performant instead.

Comment by Daniel Pasette (Inactive) [ 20/Mar/15 ]

mdcallag, only linux uses tcmalloc in 3.0. Previously osx also used tcmalloc.

Comment by Mark Callaghan [ 16/Mar/15 ]

Just curious, does this mean mongod doesn't using tcmalloc or jemalloc for Windows?
"Windows has a single lock on its heap. I even ran the repro with two heaps in mongod, one heap for mongodb, and a separate heap for WT."

Comment by Mark Benvenuto [ 01/Mar/15 ]

The repro consists of about 100 connections with a mix of inserts, and updates across 150 collections in a single database.

There are a few issues. The "memory leak" occurs because of a session & cursor "leak". If we fix the session/cursor leak, there may be other issues though.

We can see the leak from the WT counters. These are the current open sessions, and open cursors counters. These are not the sessions/cursors currently in use in a transaction (that count would be lower), just the ones that are open. Note, these counters are accurate, there appear to be no bugs in the counters themselves. I instrumented the code to confirm the session count for instance.

       "session" : {
               "open cursor count" : 125461,
               "open session count" : 446
       },

Results of Investigations

  1. Heavy Windows Heap Contention after the cache size is reached
  2. WiredTigerSessionCache has significant performance problems, and suffers from cache anti-patterns.
  3. WiredTigerSession cursors needs to be redesigned.

Heap Contention
Windows has a single lock on its heap. I even ran the repro with two heaps in mongod, one heap for mongodb, and a separate heap for WT. The primary source of all the heap contention is WT, not the rest of mongod's allocations. We have seen this before in some of other testing we have done, it is an opportunity for improvement, but not a blocking issue at the moment.

WiredTigerSessionCache
The WiredTigerSessionCache is trying to cache WT_SESSION objects from WT. The problem is that as part of the work to avoid lock contention, its effectiveness as a cache has suffered.

        const int cachePartition = cachePartitionGen.addAndFetch(1) % NumSessionCachePartitions;

The cache is divided into 64 chunks. This one line ensures that a thread will never hit the same session cache entry twice. It also means that if a customer had a single threaded application, we would go through 64 WT_SESSIONS before a thread ever actually reuses a single session since there is no cpu or thread locality into the cache. The cachePartitionGen.addAndFetch(1) might as well be a call to random.

Also, in this repro, the number of open sessions has grown to 446 yet there are only 100 connections. This also shows how the round robin between buckets has caused the session count to greatly exceed the number of connections.

A better approach would be GetCurrentProcessorNumber/sched_cpu. See this page for other alternatives.

WiredTigerSession Cursor Cache
The cursor cache is designed as a map<wt table id, vector< cursor > > where the vector is capped at 10 cursors. The map itself is not limited in size. This means that cursor cache can have a max size be 10 * N where N is the number of WT tables (i.e., mongo collections + indexes). This cursor cache is per WiredTigerSesson, and since there can be an unbounded number of WiredTigerSession, the total number of open cursors can grow very large.

Ideally, a MRU data structure would be used instead like those available in Boost Multi Index

Comment by Daniel Pasette (Inactive) [ 27/Feb/15 ]

mongod should not generally be using that much extra memory over configured cache size. If you are able to modify the test program from SERVER-17299 in a way which can trigger this condition, that would extremely helpful of course. If you can compress and attach the log file from a run which hits this issue, that would also be helpful.

Thanks again for the report.

Comment by Nick Judson [ 26/Feb/15 ]

And one more piece of information - the stack traces always seem to happen at shutdown, and seem to happen twice. The last of the two appears immediately before the ^C is logged (in my log files).

Comment by Nick Judson [ 26/Feb/15 ]

So mongod must have some leeway with the --wiredTigerCacheSizeGB flag. As a test I set it to 2GB and it's now at 2.9 GB [and now up to 4GB] in both mongostat and task manager.

Not sure if this is expected or perhaps a slow memory leak?

Comment by Nick Judson [ 26/Feb/15 ]

I have been able to repro this. Here are some details:

-wiredTiger rc10
-i7 5960X, 16gb, SSDs
-Win 8.1 64bit

I'm using the following command line:

mongod --dbpath=e:\mongo --storageEngine wiredTiger --wiredTigerCacheSizeGB=6 --logpath-c:\mongo\mongo.log

It appears that mongod is using > 6GB (see screenshot), and so system paging kicks in when it hits ~9GB. This, obviously, slows things down quite a bit, but it also seems to coincide with the stacktrace in the first screenshot. See below, the exceptions appear to be thrown at shutdown...I just happen to stop mongod when speeds drop.

I have a 110MB log file, which shows exactly the same as the existing screenshot.

Comment by Nick Judson [ 26/Feb/15 ]

I don't think I have any logs - this was run in the console window without a log target argument. Would MMS have anything useful? I will attempt to repro when I get a minute (with logging to a file).

I can't share the code unfortunately. I have a sample which is attached to SERVER-17299 but it isn't a full representation of what I'm doing. I'll see if I can build it out for you.

Comment by Ramon Fernandez Marina [ 26/Feb/15 ]

nickj, can you please upload full server logs in text? Also, is it possible for you to share the code you're using to trigger these messages so we can try to reproduce them on our end?

Thanks,
Ramón.

Generated at Thu Feb 08 03:44:15 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.