[SERVER-27909] OOM and possible memory leaks Created: 03/Feb/17 Updated: 12/Feb/17 Resolved: 07/Feb/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | WiredTiger |
| Affects Version/s: | 3.4.1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Vitaly Dyatlov | Assignee: | Kelsey Schubert |
| Resolution: | Duplicate | Votes: | 1 |
| Labels: | Bug, crash | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Operating System: | ALL | ||||||||
| Participants: | |||||||||
| Description |
|
We have switched one of our servers to WT to test it out. settings:
after 24 hours it eated all server memory failed with OOM:
I restarted mongo server and added a swap - 10GB.. for some time memory usage was at about 65% and then started to go up again.
So all memory eaten and roughly 2.5GB of swap occupied. Please find in attachment last 4 serverStatus reports. |
| Comments |
| Comment by Vitaly Dyatlov [ 12/Feb/17 ] | |||||
|
An update: Collection create/removal helps for some time. Today's picture, 4 days after we refreshed cursor cache (add/del collection):
All memory got occupied. Server load line: `load average: 120.84, 144.86, 123.1` (but usual load was 4-8) We had to immediately restart the server. Don't you have a condition in your code to limit cache usage? It's indeed a leak and huge unstability sign. | |||||
| Comment by Kelsey Schubert [ 07/Feb/17 ] | |||||
|
Hi dyatlov, The majority of users see significant improvements after switching to WiredTiger and run in production without issue. However, we are aware of a few specific workloads (many tables, workload with many small updates), where WiredTiger does not perform as well as MMAPv1. I understand the frustration of not being able to upgrade at this time, and we are working very hard to improve WiredTiger's performance for these use-cases. In your case, there are significant number of data handles, which is exasperating this issue. In MMAPv1, it was preferable to have many collections, as it cannot provide document-locking that WiredTiger allows. In addition to Kind regards, | |||||
| Comment by Vitaly Dyatlov [ 07/Feb/17 ] | |||||
|
Thomas, should I read your answer as "WiredTiger is not stable enough for production"? Vitali. | |||||
| Comment by Kelsey Schubert [ 07/Feb/17 ] | |||||
|
Hi dyatlov, Thank you for providing the diagnostic.data. The cause of the the OOM appears to be an accumulation of WiredTiger cursors in the session cache. Near the beginning of the diagnostic.data, we see that an operation takes place that invalidates WT cursor cache, and the memory usage drops.
We're investigating long term solutions to this issue in Please feel free to vote for Kind regards, | |||||
| Comment by Vitaly Dyatlov [ 05/Feb/17 ] | |||||
|
A small update to this: occupied already 4.5G of swap file (peak was 4.7 last night). | |||||
| Comment by Kelsey Schubert [ 03/Feb/17 ] | |||||
|
Hi dyatlov, Please note that files uploaded this ticket are public. If you are concerned about the privacy of your logs, I've created a secure upload portal for you to use. Kind regards, | |||||
| Comment by Kelsey Schubert [ 03/Feb/17 ] | |||||
|
Hi dyatlov, Would you please attach an archive of the diagnostic.data directory and the complete logs for the affected mongod? Thank you, |