[SERVER-20104] WT high memory usage due to high amount of free memory accumulated by TCMalloc Created: 25/Aug/15 Updated: 06/Dec/22 Resolved: 04/Feb/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Performance, WiredTiger |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Eitan Klein | Assignee: | Backlog - Storage Execution Team |
| Resolution: | Done | Votes: | 0 |
| Labels: | 32qa, WTmem | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||
| Assigned Teams: |
Storage Execution
|
||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||
| Steps To Reproduce: | mongod --dbpath=d:\mongo --port=27200 --wiredTigerCacheSizeGB=3 --wiredTigerJournalCompressor=zlib --wiredTigerCollectionBlockCompressor=zlib User workload (Nick J) |
||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Description |
|
Environment:
Observation/Issues:
Breakdown of the 5G memory highlight the followings: 3 GB in the cache size (as expected) Problem: Our memory cache policy set for 1GB free, however the machine accumulate memory above this threshold. |
| Comments |
| Comment by Alexander Gorrod [ 04/Feb/16 ] | |
|
This issue is reporting an issue in a third party library (tcmalloc). We have identified a workaround. There is nothing more to do here. | |
| Comment by Alexander Gorrod [ 01/Dec/15 ] | |
|
bruce.lucas Sorry I sat on this for a long time. I think that the ability to tune this behavior using the TCMALLOC_AGGRESSIVE_DECOMMIT=true environment variable should be enough to satisfy the issue in this ticket. That is, if a workload is causing the tcmalloc thread_cache_free_bytes value to grow much larger than the configured 1GB, configure aggressive decommit via an environment variable. | |
| Comment by Bruce Lucas (Inactive) [ 26/Oct/15 ] | |
That spike takes the form of an excessively high value for central_cache_free_bytes, vs thread_cache_free_bytes for this workload, so in both cases it is a large accumulation of free memory, albeit in different places. On | |
| Comment by Nick Judson [ 23/Oct/15 ] | |
|
You can point the load producer at a Linux-based MongoDb. PM me and I can explain how to set it up. | |
| Comment by Alexander Gorrod [ 23/Oct/15 ] | |
That is the difference between this ticket and
It is hard to tell. I spent a fair while attempting to reproduce on Linux without any luck. The load-producer is a Windows specific program, so I can't be sure that I was replicating the same workload. I'm reasonably confident this is specific to Windows. > I wonder if the thread free lists grow too large, or something else is going wrong? I don't have an answer to that. Would isolating that down help lead to a fix for this problem? I'm happy to provide access to the machine I've been using for testing, if you think we could get more useful information. | |
| Comment by Mark Benvenuto [ 22/Oct/15 ] | |
|
alexander.gorrod Are you seeing similar thread_cache_free_bytes as you see in I am planning on modify the server to call GetStats. If you attach GDB, this script may help get detailed information: https://gist.github.com/alk/1148755. | |
| Comment by Alexander Gorrod [ 22/Oct/15 ] | |
|
I ran the same test with tcmalloc 2.4, and the behavior is not improved.
My current recommendation is to leave this alone in the MongoDB code. It is possible to replicate the aggressive decommit behavior by setting an environment variable:
Upgrading to tcmalloc 2.4 in MongoDB shows a degradation in behavior for this workload. mark.benvenuto Could you review this ticket and let me know if you can think of anything else we could do to alleviate the issue? | |
| Comment by Alexander Gorrod [ 21/Oct/15 ] | |
|
A further resource is that there has been a change in upstream tcmalloc implementation to add a maximum memory budget for tcmalloc: https://github.com/gperftools/gperftools/issues/451 The configuration option is TCMALLOC_HEAP_LIMIT_MB. That code hasn't made it into a release of tcmalloc yet, but it should help with this problem in the future. | |
| Comment by Alexander Gorrod [ 21/Oct/15 ] | |
|
I reproduced this on a Windows 2012 r2, running with the latest 3.2 release candidate I see the same behavior as reported here:
You can see that the run went for between 6 to 8 hours. The total thread cache bytes grows to about 3.5GB. I then made an adjustment to the tcmalloc configuration to enable aggressive decommit, and it generated the following results:
The run was going for between 12 and 15 hours. You can see that the total thread cache bytes only grows to 1.4GB - which is still above the configured 1GB maximum, but much less worrying than 3.5x the configured maximum. I intend to re-run again with a newer release of tcmalloc (2.4, up from 2.2). The more recent release of tcmalloc enables aggressive decommit as the default. | |
| Comment by Nick Judson [ 17/Oct/15 ] | |
|
And as a further note, from a practical perspective, this isn't much of an issue for me, although from a technical perspective I'm sure you want to track it down. A WT caching strategy that addresses the issues with Windows file system cache has a far bigger impact in my world that this ( | |
| Comment by Nick Judson [ 17/Oct/15 ] | |
|
A quick note that in some recent tests with 3.2RC0, I only see overage with lower WT caps. For example, setting the cache size to <=4GB, is see an extra ~1.6 - 2 GB of memory usage, but when setting the cache size to 10GB, I don't see any overage. | |
| Comment by Nick Judson [ 12/Oct/15 ] | |
|
@Alexander Gorrod - I can provide you with the repro for this, although 20306 looks like it is probably the same issue and the repro there looks more straight-forward. Email me if you want the repro. | |
| Comment by Alexander Gorrod [ 12/Oct/15 ] | |
|
I haven't been able to reproduce this behavior by reconstructing the described workload. There is another ticket for a similar memory consumption issue | |
| Comment by Eitan Klein [ 26/Aug/15 ] | |
|
More details from Nick J Configuration:
~100 threads produced foot print of 8GB heap allocation. but with ~5 GB of free memory in TCMalloc (per thread cache + central cache)
| |
| Comment by Michael Cahill (Inactive) [ 25/Aug/15 ] | |
|
It looks like this is really concerning caching within tcmalloc – from the stats, the WiredTiger cache stays flat at ~2.4GB and tcmalloc reports ~2.5GB memory allocated, so there is no significant amount of additional memory allocated by MongoDB or WiredTiger. MongoDB sets tcmalloc.max_total_thread_cache_bytes to 1GB by default – it seems to be going over that in this case. acm, is there someone on your team who can investigate? |