[SERVER-22255] mongod with WT killed by OOM, large journal files present Created: 21/Jan/16 Updated: 25/Jan/16 Resolved: 25/Jan/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | WiredTiger |
| Affects Version/s: | 3.0.8 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Anthony Brodard | Assignee: | Unassigned |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Operating System: | ALL |
| Participants: |
| Description |
|
Hi, We had an issue yesterday on one of our cluster composed to three shards (1 primary / 2 secondaries / 2 arbiters).
After more than hour at this step, we saw two things :
Because we don't know how many times the instances will took to start, we copied all the data directory and removed journal files. Then the instance started correctly and cluster were available again. So, here are my questions :
Regards, |
| Comments |
| Comment by Ramon Fernandez Marina [ 25/Jan/16 ] | ||
|
Understood, thanks for letting us know anthony@sendinblue.com. Please let us know if you run into this issue again so we can investigate further. Regards, | ||
| Comment by Anthony Brodard [ 25/Jan/16 ] | ||
|
Hi Ramon, Thanks for the quick reply. I think this issue can be closed. I will reopen if needed. Regards, | ||
| Comment by Ramon Fernandez Marina [ 21/Jan/16 ] | ||
|
anthony@sendinblue.com, a colleague pointed out that deleting 121GB of journal files is equivalent to deleting that much data from your database. The large journal files may have been caused by your servers running too close to capacity – the data-collection procedure described above (or an upgrade to 3.2.1) will tell us more. | ||
| Comment by Ramon Fernandez Marina [ 21/Jan/16 ] | ||
|
anthony@sendinblue.com, WiredTiger journal files are removed after a checkpoint completes; the behavior you're observing suggests that checkpoints may be taking a very long time. This scenario was reported earlier in The out-of-memory condition is likely a symptom of the above or perhaps something else. My first recommendation would be to upgrade to MongoDB 3.2.1, which includes several improvements in memory management that could help your use case. If you do upgrade and the issue appears again please let us know, and we'll investigate further. If upgrading is not feasible at this time, can you please run the following from a user shell?
This will collect data in two files, ss.log and iostat.log. If you run these commands from a server restart until the problem appears again we should be able to determine if you're indeed running into Thanks, |