[SERVER-17542] Out of Memory crash with wiredTiger Created: 11/Mar/15 Updated: 09/Jun/15 Resolved: 21/Apr/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | WiredTiger |
| Affects Version/s: | 3.0.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Chad Kreimendahl | Assignee: | Sam Kleinman (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Operating System: | ALL | ||||||||
| Participants: | |||||||||
| Description |
|
We upgraded to 3.0.0 following the day of release and did a mongorestore to entirely rebuild all of our databases into wiredTiger's engine. It worked well for approximately 4 days of low-level usage before throwing an "out of memory" error during some regular (non-heavy) usage. It doesn't appear as if this is related to We're configured with 4GB of memory allocated to WT. Snappy compression is on, as is prefixCompression. The machine is virtual and dynamically allocates memory. It appears that the system was only using 8GB out of 12GB of potential memory at the time of crash.
|
| Comments |
| Comment by Michael Templeman [ 09/Jun/15 ] | |||||||||
|
I have been regularly encountering an OOM error that I reported here: https://jira.mongodb.org/browse/SERVER-16902 This also seems related to 16997 | |||||||||
| Comment by Ramon Fernandez Marina [ 21/Apr/15 ] | |||||||||
|
sallgeud, since you were not able to recreate the issue we're going to resolve this ticket. However you may want to tune into Regards, | |||||||||
| Comment by Chad Kreimendahl [ 10/Apr/15 ] | |||||||||
|
I was not able to recreate it after rerunning the tool that does this. We also just switched our dev system back to mmapv1 in preparation for regression testing of a release we have coming out. We'll switch back over to WT in a few weeks, once we get all of our testing done. | |||||||||
| Comment by Ramon Fernandez Marina [ 10/Apr/15 ] | |||||||||
|
Hi sallgeud, have you had a chance of running with a pagefile that's allowed to grow up to 12GB? Any progress testing things on your end? Thanks, | |||||||||
| Comment by Chad Kreimendahl [ 20/Mar/15 ] | |||||||||
|
We have our classes keep track of their indexes (for the most part), so that shouldn't be a problem for us to just drop and recreate. Still have potential OOM issues. Going to attempt to run with updated code on drop by Monday and confirm memory usage. | |||||||||
| Comment by Ramon Fernandez Marina [ 20/Mar/15 ] | |||||||||
|
Hi sallgeud, on Windows if the pagefile is not being used it doesn't mean it wasn't needed – it may mean that Windows could not find enough memory for a given request (see Note that for every document removal the index needs to be updated as well, so on a collection with a large number of documents with a bunch of indexes a remove() operation may take a long time and consume resources unnecessarily. Note that you can save the indexes and restore them later:
This simple approach does not preserve index options though, so the recommended approach would be for your application to keep track of what indexes are needed. Hope this helps. | |||||||||
| Comment by Chad Kreimendahl [ 20/Mar/15 ] | |||||||||
|
It doesn't appear as if we ever used more than ~250MB of swap/pagefile... or at least the system suggests that's the max its hit. I'm not terribly concerned about windows-only issues, as we're primarily not windows in our higher level environments with Mongo. General concern is the crash vs graceful recovery or just degradation, which would be more understandable if paging. I'd be happy to Collection.Drop(). However, doesn't that also destroy any indexes that were created? We really need to keep all of the indexes while just nuking all of the data. More like a database style Truncate. So, drop is substantially less practical than removeAll, by our estimation. From the docs:
Yes, separate issue. Yes, version 3.0.1 still takes extraordinarily long to show dbs. Just mentioning it here because in getting info while debugging this, I find myself doing a ton of waiting | |||||||||
| Comment by Sam Kleinman (Inactive) [ 20/Mar/15 ] | |||||||||
|
Thanks for the response, a couple of things:
Thanks for your patience! Regards, | |||||||||
| Comment by Chad Kreimendahl [ 18/Mar/15 ] | |||||||||
|
Uncompressed data information (forgot to include):
| |||||||||
| Comment by Chad Kreimendahl [ 18/Mar/15 ] | |||||||||
|
How much storage is available to MongoDB in the dbPath directory?
What is your data size? Both, in terms of the data files exactly, and the data size reported by dbStats?
What is the size of the page file?
Is this system running directly on hardware or in some sort of virtual environment?
Have you tested this with the latest stable production release (3.0.1?)
Do you have a sense of what the operation (i.e. the delete operation on the second line of your log) was? It looks like you deleted 50k records just prior to the error. Would it be possible to try and recreate this issue to ensure that some aspect of the delete operation doesn't trigger the error?
| |||||||||
| Comment by Sam Kleinman (Inactive) [ 18/Mar/15 ] | |||||||||
|
Sorry that you've hit this error. Can you provide more information about the environment where this is running? The answers to the following questions may help us debug
|