[SERVER-20159] Out of memory on index build during initial sync even with low cacheSize parameter Created: 27/Aug/15 Updated: 30/Dec/15 Resolved: 18/Sep/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | WiredTiger |
| Affects Version/s: | 3.0.6 |
| Fix Version/s: | 3.0.7, 3.1.9 |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Anthony Pastor | Assignee: | Michael Cahill (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | WTmem | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Debian Wheezy (7.7) |
||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||||||
| Operating System: | Linux | ||||||||||||||||||||||||||||||||||||
| Backport Completed: | |||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||
| Description |
|
When we try to rebuild a 3.0.6 replicaSet member the process mongod is killed by OOMKiller.
/var/log/syslog:
/var/log/syslog:
/var/log/mongodb/mongodb.log:
We use:
I've attached a RAM graph to this thread. |
| Comments |
| Comment by Anthony Pastor [ 21/Oct/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Ok perfect, we sucessfully completed the migration to WiredTiger using a higher ulimit value and using --wiredTigerEngineConfigString=file_manager=(close_idle_time=300). Thanks a lot again for your feedback. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 16/Oct/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi anthony.pastor. WiredTiger keeps idle file handles open for about 24h to avoid delays when accessing collections and indexes. While this is a good choice for most users, deployments like yours with such a large number of collections may run into OS limits under certain circumstances. One thing you can do is to lower the threshold for closing idle file handles with the following command line argument
where NNN is the number of seconds a file handle needs to be idle for WiredTiger to close it. You can try setting a low value for NNN and evaluate how that affects your operations based on your data access patterns. Note however that WiredTiger needs at least two file handles per collection: one for the collection and one for every index (so at least one for the _id index), so if your schema contains a very large number of collections and/or indexes you'll need to adjust your OS limits accordingly. To answer your question about my settings, I had a 100K limit on file handles and the initial sync completed successfully, but I only had the uniques database. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Anthony Pastor [ 16/Oct/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi, For 3 days we try to rebuild our cluster with the 3.0.7 with no luck. 2015-10-15T02:36:34.281+0200 D REPL [ReplicationExecutor] Scheduling heartbeat to 172.16.21.3:27020 at 2015-10-15T00:36:36.281Z Currently we're trying to rebuild with a new open files limitation: 900 000 The number of file descriptor never decrease during the rebuilding. $ ls /proc/19802/fd/ | wc -l Is that normal file descriptor never decrease ? Thanks | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Anthony Pastor [ 18/Sep/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks a lot, that's great ! | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 18/Sep/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
anthony.pastor, this is to let you know that a fix for the bug you reported has made it into the MongoDB tree, so we've resolved this issue. The fix will be in the upcoming 3.0.7 stable release, scheduled for late September / early October. If you can't wait you can download a nightly build for linux-64, with the strong recommendation that you upgrade to 3.0.7 when it becomes available. Thanks again, | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Michael Cahill (Inactive) [ 18/Sep/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Resolved by latest WiredTiger drop. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 17/Sep/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Author: {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'}Message: Merge pull request #2197 from wiredtiger/ (cherry picked from commit 3b723612cd3273662c0fd2d74603047e0ba99460) | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 17/Sep/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Author: {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'}Message: Merge pull request #2197 from wiredtiger/ (cherry picked from commit 3b723612cd3273662c0fd2d74603047e0ba99460) | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 16/Sep/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Author: {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'}Message: Merge pull request #2197 from wiredtiger/
| ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 16/Sep/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Author: {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'}Message: Merge pull request #2197 from wiredtiger/
| ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 16/Sep/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}Message: | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Anthony Pastor [ 14/Sep/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thank you very much for your efforts ! | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 14/Sep/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
anthony.pastor, we believe we have identified the root cause of this bug, and are actively investigating a fix. Thanks, | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Michael Cahill (Inactive) [ 14/Sep/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I think the problem here is this code in __wt_cache_eviction_worker:
In this workload, we will always have busy==1 during the walk, and I think it will also quickly have the oldest transaction ID in the system because it is single-threaded. This code exists to avoid deadlocks in workloads with tiny caches: I'll try to remove it and see whether we can avoid those hangs in some less problematic way. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Keith Bostic (Inactive) [ 11/Sep/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Here's the interesting threads after the WT cache has started growing. From the WT stats: Sep 11 16:47:09 34912239018 /home/rf/tmp/
| ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 11/Sep/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
anthony.pastor, after several attempts I've been able to reproduce the symptoms described in this ticket: during the _id index build in uniques.hll_l13_r6 memory consumption climbs enough for mongod to get killed by the OOM killer. I'm now going over the server data I collected in the process. We're keeping this ticket in "debugging with submitter" state until we find out the root cause of the problem. Thanks again for all your help so far. Regards, | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 09/Sep/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
anthony.pastor, this is just to let you know that I was able to start up a node with the data files you sent us. I set up a secondary running WiredTiger and it's now going through initial sync. Given the sheer amount of data this will take a bit, but we're making progress. Regards, | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 08/Sep/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks anthony.pastor, I didn't see 17 and 18 last week when I downloaded the rest of the files. Perhaps they were affected by the same problem as the previous missing files? In any event, I see them now and have downloaded them to my computer. We'll keep you posted on my reproduction efforts. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Anthony Pastor [ 08/Sep/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Upload completed. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Anthony Pastor [ 08/Sep/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
MD5 hashes are the same. I'm re-uploading them in case you haven't received those ones. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 08/Sep/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
anthony.pastor, I received an error from tar when extracting the file. Give the sheer size of the data it's taking me a while to find out what's the part.NN file that's causing the error. Here are the MD5 checksums of the files I have:
If you still have the part.NN files and can run md5sum on them and verify the checksum against the list above we should be able to tell which file(s) got corrupted in flight. We're actively working to get to the bottom of this issue, and your help and patience are much appreciated. Regards, | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Anthony Pastor [ 07/Sep/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Could you reproduce the problem with the datafiles ? | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 02/Sep/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks for the additional information and for uploading the missing files anthony.pastor, much appreciated. We believe the behavior you're seeing could be explained by | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Anthony Pastor [ 02/Sep/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I uploading the missing part right now. The reconstruction failed again during an index rebuilding. , name: "id", ns: "uniques.hll_l13_r6" } | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 01/Sep/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
anthony.pastor, the files I listed above are missing from the upload portal. I was convinced I saw part.01 and part.01 before, but we couldn't find any problems with the S3 bucket used for this purpose and we couldn't find a scenario that would make that possible (e.g.: partial update). Could you please re-upload them? Thanks, | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 01/Sep/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks for the heads-up anthony.pastor – and for uploading the files! Downloading them now, will post updates to this ticket as they become available. EDIT:
Will check to see if there was an error on our end and let you know if you need to re-upload these. Apologies for the inconvenience. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Anthony Pastor [ 01/Sep/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Uploads are completed. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 01/Sep/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks Anthony. We can indeed see part.00 through part.06, about 5 GB each. Let us know when the upload completed. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Anthony Pastor [ 01/Sep/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I re-started the secondary, it rebuilded indexes. Used RAM is still very high as the rebuilding is going on. (less than 3GB free on a 128GB server). About the datafiles: The compressed file is 93GB. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 29/Aug/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The captured data shows a spike in the bytes held in the cache at the very end, right before the node is killed by the OS:
Not sure if this is relevant yet. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 28/Aug/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
anthony.pastor, here are the next steps we can try to investigate this problem:
The contents of /var/lib/mongodb from the secondary should be enough to recreate this problem on our end. If in step 3 above the size of the data is larger than 5GB, then you can run:
This will produce a series of part.XX where XX is a number; you can then upload those files via the same portal described above. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Anthony Pastor [ 28/Aug/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
On the secondary node i did : rs.slaveOk() Here's the output: On the primary we have almost 1.3TB of uniques.* files. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 28/Aug/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Apologies if I wasn't clear anthony.pastor – yes, the above instructions are for the secondary. On the primary there's a file per database, so you'd need to upload /var/lib/mongodb/uniques.*. The reason I sent instructions for the secondary is because WiredTiger uses a file per collection, and since index building happens after the collection has been replicated all the collection data should be present in the secondary – but I also realize that in the primary the data files will be 2GB in size at most, so it will be easier to upload /var/lib/mongodb/uniques.*. bruce.lucas@10gen.com, do we need the data from the primary, the secondary, or it doesn't matter? Thanks, | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Anthony Pastor [ 28/Aug/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The primary isn't using WiredTiger yet. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 28/Aug/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Having at 2.6.10 primary and a 3.0.x secondary should work, as that's the upgrade path for replica sets, so nothing wrong with that. To find the exact collection name you can run this from the mongo shell:
You should find the corresponding file /var/lib/mongodb. Please note that the upload portal limits files to 5GB in size; if this file is larger it will need to be split – let me know if you need a recipe for doing that. Thanks, | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Anthony Pastor [ 28/Aug/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thank you for you answer. I notice i haven't wrote that the alive primary is still running 2.6.10 without WiredTiger yet. The issue is happening during the switch to WiredTiger. How could i get the exact file name on disk for this collection ? | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 28/Aug/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks for collecting this information anthony.pastor, we're taking a look now. I tried to reproduce this behavior yesterday, unsuccessfully. A colleague points out that this may be very difficult to reproduce and that we may need to see the database files directly to understand what's triggering this condition. If you would consider sharing the uniques.hll_l13_r6 collection with us you may upload it privately and securely via this portal. You can find the exact file name on disk for this collection by running the following from the shell:
Thanks, | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Anthony Pastor [ 28/Aug/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi, I restarted the replication yesterday to capture the metrics you asked. The Mongod was once again killed by OOM during index building on the same collection:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Anthony Pastor [ 28/Aug/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 27/Aug/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks for your report anthony.pastor, and sorry you're experiencing this problem. I'd say this is definitely unexpected, so we'll need some more information to understand what's happening. I'd like to ask for two things.
Thanks, |