[SERVER-19673] Excessive memory allocated by WiredTiger journal Created: 30/Jul/15 Updated: 11/Jan/16 Resolved: 07/Aug/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 3.0.5 |
| Fix Version/s: | 3.0.6, 3.1.7 |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Frederick Cheung | Assignee: | Susan LoVerso |
| Resolution: | Done | Votes: | 2 |
| Labels: | WTmem, mms-s | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
amazon linux 2015.03 on r3.2xl instances |
||
| Attachments: |
|
||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Operating System: | Linux | ||||||||||||||||||||||||
| Backport Completed: | |||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Description |
|
We have a mongodb replica set consisting of 2 servers + 1 arbiter, all running mongodb 3.0.4, with the wired tiger storage engine. I updated the secondary to 3.0.5. It appeared to start syncing from the primary (the state transitioned to secondary and the logs said it was syncing from the primary) but crashed after about 40 seconds with this output in the logs:
I restarted mongodb and the same thing happened. I downgraded to mongodb 3.0.4 and the secondary was able to rejoin the replica set and catch up as normal. The secondary wasn't under any read load at the time, but the primary was under a heavy write load |
| Comments |
| Comment by Frederick Cheung [ 11/Aug/15 ] | |||||||||||||||||||||||||||||||||||||||
|
Ah, even on our 3.0.4 instance that metric was 13G on our secondary. | |||||||||||||||||||||||||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 11/Aug/15 ] | |||||||||||||||||||||||||||||||||||||||
|
The key statistic for diagnosing this issue is the following:
This is memory used by mongod for the WT journal, and is not limited by the WT cache setting. If it is large (e.g. GBs) OOM may result. The fix for this issue limits the amount of memory used by the WT journal to (I believe) 32 MB. | |||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 07/Aug/15 ] | |||||||||||||||||||||||||||||||||||||||
|
The fix for
This was merged into mongo's master and v3.0 branches, so I'm resolving this ticket. | |||||||||||||||||||||||||||||||||||||||
| Comment by Michael Cahill (Inactive) [ 03/Aug/15 ] | |||||||||||||||||||||||||||||||||||||||
|
The backport of https://evergreen.mongodb.com/version/55bf0d8a3ff12268ad000030_0 | |||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Pasette (Inactive) [ 31/Jul/15 ] | |||||||||||||||||||||||||||||||||||||||
|
sue.loverso has identified the fix for this issue as | |||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Pasette (Inactive) [ 31/Jul/15 ] | |||||||||||||||||||||||||||||||||||||||
|
Not sure if this is a duplicate, but same stack trace as | |||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Pasette (Inactive) [ 31/Jul/15 ] | |||||||||||||||||||||||||||||||||||||||
|
I managed to reproduce with the additional information from fcheung (Thank you!) and by adding an abort() in the error handling code for wiredTiger. We're still looking for the root cause.
| |||||||||||||||||||||||||||||||||||||||
| Comment by Frederick Cheung [ 31/Jul/15 ] | |||||||||||||||||||||||||||||||||||||||
|
Our deployment is a 2 node + arbiter replica set. The nodes are r3.2xl EC2 instances with a 4TB EBS volume. The write workload is pure insert - we insert documents into the new collection, when we are done clients start reading from the newly created collection (the clients have a read preference of primary preferred so they shouldn't be reading from the secondary). I've attached the stats for one of those collections. It is one of our biggest - the smallest ones have 200-300k documents of 20k each | |||||||||||||||||||||||||||||||||||||||
| Comment by Frederick Cheung [ 31/Jul/15 ] | |||||||||||||||||||||||||||||||||||||||
|
collection stats from one of our collections | |||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Pasette (Inactive) [ 31/Jul/15 ] | |||||||||||||||||||||||||||||||||||||||
|
fcheung/amit.ambasta@delhivery.com: thanks for the additional information. It does seem unlikely to be a real out of memory error, but that is the message that the storage engine is returning. I'm wondering if you could provide some more details about your deployment. fcheung mentioned that the primary was under heavy write load. Is this also the case for you Amit? Can you provide details on the collections involved (collection stats will provide average document size, number of indexes, count, etc) and also the types of operations (updates, inserts, deletes)? Thanks | |||||||||||||||||||||||||||||||||||||||
| Comment by Amit Ambasta [ 31/Jul/15 ] | |||||||||||||||||||||||||||||||||||||||
|
We are facing the same issue post upgrade to 3.0.5
| |||||||||||||||||||||||||||||||||||||||
| Comment by Frederick Cheung [ 31/Jul/15 ] | |||||||||||||||||||||||||||||||||||||||
|
Hi, I've attached the config file (& the log since jira seems to have mangled the backtrace section). The only thing running other than mongo that isn't part of the stock amazon linux distribution is a newrelic server monitoring daemon. There shouldn't have been memory problems - mongo had basically all of the 61G that instance has to play with & downgrading to mongo 3.0.4 fixed the issue | |||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Pasette (Inactive) [ 30/Jul/15 ] | |||||||||||||||||||||||||||||||||||||||
|
The log message says "Cannot allocate memory" |