[SERVER-67936] Server stuck when systemctl autorestart after mongo crash Created: 11/Jul/22 Updated: 26/Oct/22 Resolved: 26/Oct/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | moroine bentefrit | Assignee: | Edwin Zhou |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Operating System: | ALL |
| Participants: |
| Description |
|
Hi, I'm running MongoDB 4.4 on Ubuntu 20. I've created the following file:
MongoDb is on a server with other services, and I suspect an out-of-memory error. Here are the logs from journalctl:
The issue is when I run `systemctl status mongod` it says active, and I can even see `mongod` process but I cannot access the server. Also memory usage of the process is very low (about 50Mo). Also the mongod process didn't write any startup_log, when I manually `systemctl restart mongod` it works. My guess is the autorestart happen too fast after crash, hence the memory is not released yet. But, the `systemctl status mongod` shouldn't return active. |
| Comments |
| Comment by Edwin Zhou [ 26/Oct/22 ] |
|
We haven’t heard back from you for some time, so I’m going to close this ticket. If this is still an issue for you, please provide additional information and we will reopen the ticket. |
| Comment by Edwin Zhou [ 04/Oct/22 ] |
|
Hi moroine.bentefrit@gmail.com, We're still interested in this issue but need additional information to diagnose the problem. If you hit this problem again, would you please archive (tar or zip) and upload to the secure upload portal:
Best, |
| Comment by Edwin Zhou [ 16/Sep/22 ] |
|
Hi moroine.bentefrit@gmail.com, If I'm understanding correctly, after the mongod was killed due to OOM, the process doesn't restart unless you manually intervene with systemctl restart mongod. This occurs on MongoDB v5.0.6 and not v4.4 as originally reported in the description. It is unusual for the mongod to not fully restart after hitting OOM. However, it also seems that this problem is not reliably reproducible. Without diagnostic data and logs from the mongod, we will find it difficult to diagnose this issue. If you hit this problem again, would you please archive (tar or zip) and upload to the secure upload portal:
Best, |
| Comment by moroine bentefrit [ 31/Aug/22 ] |
|
No, I mean the bugs occurred under `5.0.6`, not `4.4`. The bugs still persist IMO, I had only one use case making the MongoDB crash due to OOM. I tried to reproduce it, but I'm not able to. |
| Comment by Edwin Zhou [ 30/Aug/22 ] |
|
Hi moroine.bentefrit@gmail.com Thank you for the follow up! It sounds like you're no longer seeing this problem after upgrading from 4.4 to 5.0.6. Did you make any changes to your NodeJS application that helped prevent your application from using too much memory which resulted in OOM on MongoDB? |
| Comment by moroine bentefrit [ 30/Aug/22 ] |
|
Thanks for checking, unfortunately, I don't have the logs anymore as it was on a PreProd server. I forgot we updated MongoDB, hence the version was 5.0.6 and not 4.4. I tried to reproduce but unfortunately, I'm not able to.
In my server, I have a NodeJS application, which uses MongoDB. I cannot reproduce anymore but looks like it's due to the NodeJs process taking too much RAM. Then OOM killer kills both NodeJS & MongoDB but MongoDB fails to restart as mention in the bug description. |
| Comment by Edwin Zhou [ 29/Aug/22 ] |
|
Hi moroine.bentefrit@gmail.com, Thank you for your patience while I investigate this issue. Could you please provide some additional diagnostic data covering this behavior? I've created a secure upload portal for you. Files uploaded to this portal are hosted on Box, are visible only to MongoDB employees, and are routinely deleted after some time. For each node in the replica set spanning a time period that includes the incident, would you please archive (tar or zip) and upload to that link:
Best, |