[SERVER-37795] WiredTiger again uses way more memory than instructed by cache limit in 3.6 Created: 29/Oct/18  Updated: 27/Oct/23  Resolved: 22/Nov/18

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.6.7, 3.6.8
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Adrien Jarthon Assignee: Alexander Gorrod
Resolution: Works as Designed Votes: 0
Labels: dmd-perf, storage-engines
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File excess.png     PNG File image-2018-10-29-09-22-50-604.png     PNG File mongodb-3.6-memory-regression.png     File server_status.json    
Issue Links:
Related
related to SERVER-37926 Excess allocated memory associated wi... Closed
related to WT-4418 Don't keep key/value memory buffers a... Closed
Operating System: ALL
Sprint: Storage Engines 2018-11-19, Storage Engines 2018-12-03
Participants:
Case:

 Description   

As I said in https://jira.mongodb.org/browse/SERVER-17424?focusedCommentId=2044120&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-2044120 I noticed yet another regression in 3.6 about memory usage this time (after the secondary CPU × 4 increase and the IO usage × 5 increase), I now have almost twice the configured cacheSize used by mongo, and it increased a lot since 3.4.15:

I first upgraded to 3.6.7 and then 3.6.8 after https://jira.mongodb.org/browse/SERVER-35958?focusedCommentId=2007204&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-2007204

My configured max cache is 10G (server has 32G RAM):

rs0:PRIMARY> db.serverStatus().version
3.6.8
rs0:PRIMARY> db.serverStatus().wiredTiger.cache["maximum bytes configured"]
10737418240

And my actual usage is 19G RES (21G VIRT):

rs0:PRIMARY> db.serverStatus().tcmalloc.tcmalloc.formattedString
------------------------------------------------
MALLOC:    16556355400 (15789.4 MiB) Bytes in use by application
MALLOC: +   3597598720 ( 3430.9 MiB) Bytes in page heap freelist
MALLOC: +    227350168 (  216.8 MiB) Bytes in central cache freelist
MALLOC: +       884736 (    0.8 MiB) Bytes in transfer cache freelist
MALLOC: +    290449952 (  277.0 MiB) Bytes in thread cache freelists
MALLOC: +     67035392 (   63.9 MiB) Bytes in malloc metadata
MALLOC:   ------------
MALLOC: =  20739674368 (19778.9 MiB) Actual memory used (physical + swap)
MALLOC: +   1484984320 ( 1416.2 MiB) Bytes released to OS (aka unmapped)
MALLOC:   ------------
MALLOC: =  22224658688 (21195.1 MiB) Virtual address space used
MALLOC:
MALLOC:         286938              Spans in use
MALLOC:            886              Thread heaps in use
MALLOC:           4096              Tcmalloc page size
------------------------------------------------

Here is the full serverStatus: server_status.json



 Comments   
Comment by Alexander Gorrod [ 22/Nov/18 ]

bigbourin@gmail.com sorry the 3.6 release of MongoDB had worse performance characteristics for your application. Thanks very much for your help in understanding and addressing those issues in more recent versions of MongoDB. I'm going to close this ticket since I don't believe we can do any further work to change the behavior you report. If either the workaround or upgrading don't help please do let us know and we will investigate further.

Comment by Adrien Jarthon [ 19/Nov/18 ]

Haha ok, well considering the amount of regressions there was in 3.6 I'm probably not gonna install 4.0 anytime soon. I just lost my PSA cluster thanks to https://docs.mongodb.com/manual/reference/read-concern-majority/#disable-read-concern-majority (as many other people will, good job about that) so I'll try to give my customers a bit a stability for now.

Comment by Donald Anderson [ 14/Nov/18 ]

Hello bigbourin@gmail.com, I have worked on some of the cursor issues at a couple layers in the MongoDB stack.  It looks like your path to upgrade should be to first use 3.6.8 with mongod --setParameter wiredTigerCursorCacheSize=0 as michael.cahill suggested.  As for getting to 4.0, we have identified a different issue in 4.0.3 that has a different memory growth problem. That problem is in the WiredTiger cursor layer and is exacerbated for use cases with many updates of large records. A coding fix has been made (WT-4418) and should be in an upcoming release. Until you get a release that includes that fix, you should continue to run with mongod --setParameter wiredTigerCursorCacheSize=0, as that works around the 4.0 problem as well. I've verified that memory is pretty stable with 4.0.3 with this parameter - you should as well! Finally, after you have a 4.0.x release with the fix, you can drop the extra parameter. Running with the parameter on makes cursor operations a bit slower, and it's possible it may not be so noticeable for your use case.

Comment by Adrien Jarthon [ 08/Nov/18 ]

I confirm that with the wiredTigerCursorCacheSize change, the memory is back to stable levels (11G).
Shall I assume that no improvement will be backported to 3.6? am I supposed to be able to remove this parameter when upgrading to 4.0?

Comment by Adrien Jarthon [ 05/Nov/18 ]

Hi Michael, thanks for the details, I've just restarted with the option to see how far it goes now.

About the improvement in 4.0, this is good but with so many regressions in 3.6 it's not easy to upgrade or even test 4.0
The one I explained in SERVER-35958 (CPU on secondary) is said to be improved in 4.0 too but if I upgrade my secondary to 3.6 it dies in 24h.
Am I supposed to buy a bigger server just to be able to upgrade through the regressions?
And if try to upgrade quickly in hope it'll work better, I'll be stuck there (between 3.6 and 4.0) without being able to downgrade.
I can understand it's cumbersome to backport this fixes but I'm not the only one who will face trouble trying to upgrade through 3.6 if none of them are. If you have no better suggestion I'll buy a bigger machine for this upgrade but in this case you should at least warn people that updating to 3.6 is dangerous.

Comment by Michael Cahill (Inactive) [ 05/Nov/18 ]

bigbourin@gmail.com, as bruce.lucas explained, the memory MongoDB is using in your case is associated with cursors. The difference you are seeing in 3.6 is that MongoDB is representing small changes to documents as "modify" operations in WiredTiger, which causes a buffer to be allocated in cursors that read those documents.

The caching behavior of cursors has been significantly changed in MongoDB 4.0 and above and you should see far fewer active cursors.

In 3.6, please test with:

mongod --setParameter wiredTigerCursorCacheSize=0

This will disable one layer of cursor caching inside MongoDB. There is potential for some performance impact, but in our testing that impact has been small. What you should see is far fewer cursors open, and thus less memory allocated to buffers that hold documents.

Comment by Adrien Jarthon [ 04/Nov/18 ]

I'm back at 19G now, I've just uploaded the following 4 days of diagnostic data (10/31 to 11/04) and the updated log file from 10/29 to 11/04 (now) in case it helps.

Comment by Adrien Jarthon [ 31/Oct/18 ]

Ok thx

Comment by Bruce Lucas (Inactive) [ 31/Oct/18 ]

Thanks for the data Adrien. You are correct, the excess allocated memory at this point is sufficient to get a good signal:

The excess is accounted for by the following allocation sites, all in __wt_modify_apply or __wt_modify_apply_api. I imagine this may be either scratch memory that is kept around too long, or memory in cache that is not properly accounted for. I'll pass this off to a storage expert to take a look.

2018-10-29T22:34:35.038+0100 I -        [ftdc] heapProfile stack1811: { 0: "tc_realloc", 1: "0x5621b07d9e17", 2: "__wt_buf_grow_worker", 3: "0x5621b0813d09", 4: "__wt_modify_apply", 5: "__wt_value_return_upd", 6: "__wt_value_return", 7: "__wt_btcur_next", 8: "0x5621b0879c63", 9: "mongo::WiredTigerRecordStoreCursorBase::next", 10: "mongo::CollectionScan::doWork", 11: "mongo::PlanStage::work", 12: "mongo::PlanExecutor::getNextImpl", 13: "mongo::PlanExecutor::getNext", 14: "0x5621b09b8c8a", 15: "mongo::BasicCommand::enhancedRun", 16: "mongo::Command::publicRun", 17: "0x5621b096676a", 18: "0x5621b0967afb", 19: "mongo::ServiceEntryPointMongod::handleRequest", 20: "mongo::ServiceStateMachine::_processMessage", 21: "mongo::ServiceStateMachine::_runNextInGuard", 22: "0x5621b0973c41", 23: "mongo::transport::ServiceExecutorSynchronous::schedule", 24: "mongo::ServiceStateMachine::_scheduleNextWithGuard", 25: "mongo::ServiceStateMachine::_sourceCallback", 26: "mongo::ServiceStateMachine::_sourceMessage", 27: "mongo::ServiceStateMachine::_runNextInGuard", 28: "0x5621b0973c41", 29: "0x5621b1876545", 30: "0x5621b1e352c4", 31: "0x7f55dfee76ba", 32: "clone" }
2018-10-29T22:33:49.075+0100 I -        [ftdc] heapProfile stack1470: { 0: "tc_realloc", 1: "0x5621b07d9e17", 2: "__wt_buf_grow_worker", 3: "0x5621b0813d09", 4: "__wt_modify_apply_api", 5: "__wt_btcur_modify", 6: "0x5621b08789ee", 7: "mongo::WiredTigerRecordStore::updateWithDamages", 8: "mongo::CollectionImpl::updateDocumentWithDamages", 9: "mongo::UpdateStage::transformAndUpdate", 10: "mongo::UpdateStage::doWork", 11: "mongo::PlanStage::work", 12: "mongo::PlanExecutor::getNextImpl", 13: "mongo::PlanExecutor::getNext", 14: "mongo::PlanExecutor::executePlan", 15: "mongo::performUpdates", 16: "0x5621b0a23330", 17: "0x5621b0a1cdd8", 18: "mongo::Command::publicRun", 19: "0x5621b0966e53", 20: "0x5621b0967afb", 21: "mongo::ServiceEntryPointMongod::handleRequest", 22: "mongo::ServiceStateMachine::_processMessage", 23: "mongo::ServiceStateMachine::_runNextInGuard", 24: "0x5621b0973c41", 25: "mongo::transport::ServiceExecutorSynchronous::schedule", 26: "mongo::ServiceStateMachine::_scheduleNextWithGuard", 27: "mongo::ServiceStateMachine::_sourceCallback", 28: "mongo::ServiceStateMachine::_sourceMessage", 29: "mongo::ServiceStateMachine::_runNextInGuard", 30: "0x5621b0973c41", 31: "0x5621b1876545", 32: "0x5621b1e352c4", 33: "0x7f55dfee76ba", 34: "clone" }
2018-10-29T22:35:02.019+0100 I -        [ftdc] heapProfile stack1908: { 0: "tc_realloc", 1: "0x5621b07d9e17", 2: "__wt_buf_grow_worker", 3: "0x5621b0813d09", 4: "__wt_modify_apply", 5: "__wt_value_return_upd", 6: "__wt_value_return", 7: "__wt_btcur_next", 8: "0x5621b0879c63", 9: "mongo::WiredTigerRecordStoreCursorBase::next", 10: "mongo::CollectionScan::doWork", 11: "mongo::PlanStage::work", 12: "mongo::PlanExecutor::getNextImpl", 13: "mongo::PlanExecutor::getNext", 14: "0x5621b09b8c8a", 15: "mongo::BasicCommand::enhancedRun", 16: "mongo::Command::publicRun", 17: "0x5621b096676a", 18: "0x5621b0967afb", 19: "mongo::ServiceEntryPointMongod::handleRequest", 20: "mongo::ServiceStateMachine::_processMessage", 21: "mongo::ServiceStateMachine::_runNextInGuard", 22: "0x5621b0973c41", 23: "mongo::transport::ServiceExecutorSynchronous::schedule", 24: "mongo::ServiceStateMachine::_scheduleNextWithGuard", 25: "mongo::ServiceStateMachine::_sourceCallback", 26: "mongo::ServiceStateMachine::_sourceMessage", 27: "mongo::ServiceStateMachine::_runNextInGuard", 28: "0x5621b0973c41", 29: "0x5621b1876545", 30: "0x5621b1e352c4", 31: "0x7f55dfee76ba", 32: "clone" }
2018-10-29T22:33:43.038+0100 I -        [ftdc] heapProfile stack1327: { 0: "tc_realloc", 1: "0x5621b07d9e17", 2: "__wt_buf_grow_worker", 3: "0x5621b0813d09", 4: "__wt_modify_apply", 5: "__wt_value_return_upd", 6: "__wt_value_return", 7: "__wt_btcur_search", 8: "0x5621b087d36d", 9: "mongo::WiredTigerRecordStore::updateRecord", 10: "mongo::CollectionImpl::updateDocument", 11: "mongo::UpdateStage::transformAndUpdate", 12: "mongo::UpdateStage::doWork", 13: "mongo::PlanStage::work", 14: "mongo::PlanExecutor::getNextImpl", 15: "mongo::PlanExecutor::getNext", 16: "mongo::PlanExecutor::executePlan", 17: "mongo::performUpdates", 18: "0x5621b0a23330", 19: "0x5621b0a1cdd8", 20: "mongo::Command::publicRun", 21: "0x5621b0966e53", 22: "0x5621b0967afb", 23: "mongo::ServiceEntryPointMongod::handleRequest", 24: "mongo::ServiceStateMachine::_processMessage", 25: "mongo::ServiceStateMachine::_runNextInGuard", 26: "0x5621b0973c41", 27: "mongo::transport::ServiceExecutorSynchronous::schedule", 28: "mongo::ServiceStateMachine::_scheduleNextWithGuard", 29: "mongo::ServiceStateMachine::_sourceCallback", 30: "mongo::ServiceStateMachine::_sourceMessage", 31: "mongo::ServiceStateMachine::_runNextInGuard", 32: "0x5621b0973c41", 33: "0x5621b1876545", 34: "0x5621b1e352c4", 35: "0x7f55dfee76ba", 36: "clone" }
2018-10-29T22:33:46.022+0100 I -        [ftdc] heapProfile stack1388: { 0: "tc_realloc", 1: "0x5621b07d9e17", 2: "__wt_buf_grow_worker", 3: "0x5621b0813d09", 4: "__wt_modify_apply", 5: "__wt_value_return_upd", 6: "__wt_value_return", 7: "__wt_btcur_search", 8: "0x5621b087d36d", 9: "mongo::WiredTigerRecordStoreCursorBase::seekExact", 10: "mongo::WorkingSetCommon::fetch", 11: "mongo::IDHackStage::doWork", 12: "mongo::PlanStage::work", 13: "mongo::UpdateStage::doWork", 14: "mongo::PlanStage::work", 15: "mongo::PlanExecutor::getNextImpl", 16: "mongo::PlanExecutor::getNext", 17: "mongo::PlanExecutor::executePlan", 18: "mongo::performUpdates", 19: "0x5621b0a23330", 20: "0x5621b0a1cdd8", 21: "mongo::Command::publicRun", 22: "0x5621b0966e53", 23: "0x5621b0967afb", 24: "mongo::ServiceEntryPointMongod::handleRequest", 25: "mongo::ServiceStateMachine::_processMessage", 26: "mongo::ServiceStateMachine::_runNextInGuard", 27: "0x5621b0973c41", 28: "mongo::transport::ServiceExecutorSynchronous::schedule", 29: "mongo::ServiceStateMachine::_scheduleNextWithGuard", 30: "mongo::ServiceStateMachine::_sourceCallback", 31: "mongo::ServiceStateMachine::_sourceMessage", 32: "mongo::ServiceStateMachine::_runNextInGuard", 33: "0x5621b0973c41", 34: "0x5621b1876545", 35: "0x5621b1e352c4", 36: "0x7f55dfee76ba", 37: "clone" }
2018-10-29T22:34:04.044+0100 I -        [ftdc] heapProfile stack1648: { 0: "tc_realloc", 1: "0x5621b07d9e17", 2: "__wt_buf_grow_worker", 3: "0x5621b0813d09", 4: "__wt_modify_apply_api", 5: "__wt_btcur_modify", 6: "0x5621b08789ee", 7: "mongo::WiredTigerRecordStore::updateWithDamages", 8: "mongo::CollectionImpl::updateDocumentWithDamages", 9: "mongo::UpdateStage::transformAndUpdate", 10: "mongo::UpdateStage::doWork", 11: "mongo::PlanStage::work", 12: "mongo::PlanExecutor::getNextImpl", 13: "mongo::PlanExecutor::getNext", 14: "mongo::PlanExecutor::executePlan", 15: "mongo::performUpdates", 16: "0x5621b0a23330", 17: "0x5621b0a1cdd8", 18: "mongo::Command::publicRun", 19: "0x5621b0966e53", 20: "0x5621b0967afb", 21: "mongo::ServiceEntryPointMongod::handleRequest", 22: "mongo::ServiceStateMachine::_processMessage", 23: "mongo::ServiceStateMachine::_runNextInGuard", 24: "0x5621b0973c41", 25: "mongo::transport::ServiceExecutorSynchronous::schedule", 26: "mongo::ServiceStateMachine::_scheduleNextWithGuard", 27: "mongo::ServiceStateMachine::_sourceCallback", 28: "mongo::ServiceStateMachine::_sourceMessage", 29: "mongo::ServiceStateMachine::_runNextInGuard", 30: "0x5621b0973c41", 31: "0x5621b1876545", 32: "0x5621b1e352c4", 33: "0x7f55dfee76ba", 34: "clone" }

Comment by Adrien Jarthon [ 31/Oct/18 ]

Ok I've restarted the primary on the 29th at 9pm UTC, it's not back to 19G yet but still 15G which is already much more than 3.4
I've attached the diagnostic data and log file (which contains the heapProfile lines).

Comment by Adrien Jarthon [ 29/Oct/18 ]

I'll try to do that

Comment by Bruce Lucas (Inactive) [ 29/Oct/18 ]

Hi Adrien,

Thanks for reporting this. The serverStatus output that you uploaded tells us that the memory usage is not due to the WiredTiger cache, but rather is allocated memory outside the cache:

"bytes currently in the cache" :         8593667541,
"current_allocated_bytes" : NumberLong("16818939408"),

The diagnostic data that you uploaded confirms that but in this case it doesn't tell us much about what is allocating the memory. Over the course of the 4 days covered by the data the amount of excess memory has remained fairly constant, which is also consistent with the graphs that you provided.

Would you be in a position to enable the built-in heap profiler on an affected node by setting the heapProfilingEnabled parameter to true, e.g. on the command line:

--setParameter heapProfilingEnabled=true

(or equivalent in the config file). Note that this may have some small impact on performance, so please monitor the node(s) where you have enabled this. Please allow maybe a day or two of runtime to get a good signal in the data, then upload both the diagnostic.data and the complete log file(s) since the restart of the node to the portal. (It's important to have the complete log files because they have the crucial allocating stack information.)

Thanks,
Bruce

Comment by Adrien Jarthon [ 29/Oct/18 ]

@Eric I'm not saying it's the WT cache itself but that the totally should not use substantially more than the max WT cache size we can configure.
About the previous tickets yeah there was SERVER-17424, SERVER-20306, SERVER-22906, to name a few.

So it seems this issue has been present, improved and then regressed.

@Kelsey I've uploaded web1_diagnostic.data.tgz to the portal. BTW the tool you're using to read these diagnostic data is proprietary I guess? no chance we can use it as end users? It could be pretty useful for monitoring and performance analysis aside from bug reports.

Comment by Kelsey Schubert [ 29/Oct/18 ]

bigbourin@gmail.com, would you please upload an archive of the diagnostic.data directory to this upload portal?

Comment by Eric Milkie [ 29/Oct/18 ]

I see now – the ticket SERVER-17424 had similar symptoms in version 3.0.2, in 2015. We will investigate this issue.

Comment by Eric Milkie [ 29/Oct/18 ]

Adrien, how do you know that the physical RAM in use by tcmalloc in the mongod process is all attributable to the WiredTiger page cache? Also, what do you mean by "again" – can you link the original ticket, as I'm not familiar with it?

Comment by Adrien Jarthon [ 29/Oct/18 ]

Sorry it seems my first image upload lacks labels and I can't edit the description so here is the fixed version:

ps: It's actually 3.6.7 and then 3.6.8 not 3.6.6.

Generated at Thu Feb 08 04:47:03 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.