[SERVER-17424] WiredTiger uses substantially more memory than accounted for by cache Created: 02/Mar/15 Updated: 29/Oct/18 Resolved: 19/Jun/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | WiredTiger |
| Affects Version/s: | None |
| Fix Version/s: | 3.0.3 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Bruce Lucas (Inactive) | Assignee: | Michael Cahill (Inactive) |
| Resolution: | Done | Votes: | 8 |
| Labels: | wttt | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
Running sysbench shows the following timeline:
Call stack data was obtained by using perf to monitor calls to all tmalloc entry points. Here's a partially expanded reversed call tree for a shorter run up to 449 MB bytes allocated by WT. Timeline shows currently active memory (i.e. accounting for both allcoate and free) charged to each call site over the course of the run. Read max bytes active throughout the run for each call site from the "max.MB" column. Full call tree attached as well.
|
| Comments |
| Comment by Adrien Jarthon [ 29/Oct/18 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
Ok done → | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Michael Cahill (Inactive) [ 28/Oct/18 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
bigbourin@gmail.com, that is not normal. Please open a new ticket with details, including diagnostic data. | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Adrien Jarthon [ 27/Oct/18 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
Hi, is it considered normal that on 3.6.8 I see a memory usage overhead of 90% compared to the configured max cache size? My configured max cache is 10G:
And my actual usage is 19G RES (21G VIRT):
This has increased since 3.4 (another regression) and seem a bit excessive compared to the number announced in this ticket (and the others) for 3.0 and 3.2. | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 19/Jun/15 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
With the accounting change that went into 3.0.3 the overuse of memory fell from about 18% to about 4%, which is now on the same order as other non-cache uses of memory, and is no longer substantial overuse. | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 18/Jun/15 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
Behavior is considerably improved in 3.0.3. Running the repro above on a large machine until amount of memory in cache is 40 GB:
In 3.0.0, the excess memory consumed was about 19% (tcmalloc-reported allocated memory) or 20% (virtual memory). In 3.0.3 that has been reduced to about 4% and 6% respectively. The 6% virtual memory overage will include memory in free pools and non-allocated overhead such as stacks that WT cannot be expected to account for. The 4% allocated memory overage will include some fixed overhead allocated outside of WT, but that should be negligible relative to the 40GB, so presumably this mostly represents the approximate remaining underaccounting of memory by WT, which I believe includes things like internal fragmentation and other allocator overhead. | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 15/May/15 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
pauloandre1987@gmail.com, this issue has been scheduled for the current development cycle. Updates will be posted here as they become available, feel free to watch this ticket to be notified of updates. Regards, | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Paulo [ 15/May/15 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
Hi, Paulo | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Matt Parlane [ 27/Apr/15 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
I've just started the logging now. I'm not sure why I didn't think of this before, but I just realised that the machine is also running both MMS backup and monitoring daemons. I was thinking about what would cause the increase in open cursors and I realised they were both still running. Apologies for not clicking before, sorry if I've wasted your time. I've just stopped the daemons, I'll see if memory usage still grows. Matt | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by David Hows [ 27/Apr/15 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
Hi Matt, Thanks for uploading that information and sorry for the delay in getting back to you. Here are some numbers (presenting them here in text form because they don't vary much over the course of the data that you collected):
The resident memory (1544 MB) will include
This ticket tracks the fact that bytes allocated for the WT cache (that is, the WT cache portion of tcmalloc current_allocated_bytes) are about 20 or 30% higher than the amount that WT accounts for in its "bytes currently in the cache" statistic. However the different between tcmalloc current_allocated_bytes (1403 MB) and WT bytes currently in the cache (819 MB) in your case is much larger than that 30%, so I suspect you are actually seeing the effect of other non-cache allocated data structures. In addition the number of cursors is quite large (21k), so you may also be seeing some memory usage due to excessive caching of cursors (traced by If you could do a longer serverStatus data collection where we will see a larger variation over time in memory usage, WT cache usage, cursor count, and possibly other counters, we may be able to get a clearer picture of what correlates with the memory usage in your workload. Could you collect serverStatus data as follows?
Regards, | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Matt Parlane [ 13/Apr/15 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
The secondary is purely an offsite backup, nothing performs reads on it. I've got a script which shuts down the server every night, takes a snapshot of the data directory, and restarts it. The script is probably the reason I didn't pick up the leak till now – I was inadvertently working around it. The machine is currently up-to-date, so it's applying updates more slowly than it would be if it was having to catch up – when I came in this morning it was sitting at about 6GB RSS, and I restarted it which brought it back to the normal level (whatever that is, I can't remember). When I started the process, the RSS was at 1.459GB, when it finished it was at 1.506GB. From what I've seen it would have kept creeping up indefinitely. As I type this it's at 1.516GB. Let me know if I can provide anything else. I can provide access to the server if required. | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by David Hows [ 13/Apr/15 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
Hi Matt, Sounds like you have something interesting here and I want to follow up. The graphing utility is not available publicly, but I can share with you the following which is used to gather the required data to generate it from your MongoDB instances.
Would you be able to reproduce the load in question and run the snippet above against the secondary that is resyncing. The attach the resulting ss.log file and I will generate the graphs? In addition, can you give me a bit of insight into the workload here? Does the syncing secondary have any kind of read load routed to it? And what kind of load is on the primary that would be passed on to the secondary? Is it primarily inserts, updates or deletes that would be part of this workload? Thanks, | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Matt Parlane [ 10/Apr/15 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
I'm possibly seeing the same issue while a replica set secondary is catching up on oplog updates. Within about 30 minutes of starting, the secondary has consumed almost all the RAM on the system (8GB) and eventually runs into allocation errors. I'm running 3.0.2 with WiredTiger, with a 1GB cache size (also tried 2, 3 and 4 with same results). uname -a: See here for more info: https://groups.google.com/d/msg/mongodb-dev/q5YFjI16ZpA/eMvIkq115DoJ Please let me know if I can provide more info – perhaps someone could tell me how to make those cool graphs? | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Nick Judson [ 16/Mar/15 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
Possible dup: | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 02/Mar/15 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
Here's a simple repro. The issue seems to depend on the random inserts into indexes done by sysbench. I'm not sure whether the issue is restricted to that code path per se, or whether that is simply required to fill the cache while doing inserts (as the sequential inserts into the collection and the _id index don't fill the cache). We make the issue happen quickly by doing random inserts into 50 indexes from 10 threads.
Result below. As with the sysbench test, we see about 20% or so more bytes allocated than are reported to be in the cache.
|