[SERVER-53852] MongoDB hangs randomly Created: 16/Jan/21 Updated: 29/Oct/23 Resolved: 20/Feb/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 4.4.2 |
| Fix Version/s: | 4.4.6, 5.0.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Ashish Madeti | Assignee: | Sergey Galtsev (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||
| Backport Requested: |
v4.4
|
||||||||||||||||||||||||||||
| Steps To Reproduce: | Sorry, but I actually don't know how to reproduce it. Like I said, it randomly hangs. |
||||||||||||||||||||||||||||
| Sprint: | Security 2021-02-08, Security 2021-02-22 | ||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||
| Description |
|
I am running MongoDB 4.4.2 cluster with one Primary, one Secondary and one hidden Secondary. On the hidden Secondary, sometimes (like once every 2 days or so) MongoDB just hangs (once it also happened on the Primary). By "hangs", I mean:
I referred to https://jira.mongodb.org/browse/SERVER-34190 which looked like a similar issue (but it was fixed in 3.6.4). So I have attached the files that were requested in that issue:
Please let me know if you need anything else or you want me to try running some commands. |
| Comments |
| Comment by Githook User [ 07/Apr/21 ] |
|
Author: {'name': 'Sergey Galtsev', 'email': 'sergey.galtsev@mongodb.com', 'username': 'brushless-glitch'}Message: (cherry picked from commit 51142d61eeea0a30b2691680663d60c17441afce) |
| Comment by Ashish Madeti [ 21/Feb/21 ] |
|
Hello.
Just wanted to know in which version will the fix be live in? Does the 'Fix Version' mean that the fix will be live in MongoDB 5.0? |
| Comment by Sergey Galtsev (Inactive) [ 20/Feb/21 ] |
|
separate ticket |
| Comment by Sergey Galtsev (Inactive) [ 20/Feb/21 ] |
|
The patch broke the Mac build due to non-supported std::quick_exit |
| Comment by Githook User [ 19/Feb/21 ] |
|
Author: {'name': 'Sergey Galtsev', 'email': 'sergey.galtsev@mongodb.com', 'username': 'brushless-glitch'}Message: |
| Comment by Bruce Lucas (Inactive) [ 12/Feb/21 ] |
|
sergey.galtsev, mark.benvenuto, I think a customer may be unlikely to see a message written to stderr. I wonder if it would be a good idea to write a message to the log file instead or in addition to writing a message to stderr, but without taking a lock. I imagine this might result in a log file that's not valid json, but that might be better than not recording the error anywhere. |
| Comment by Edwin Zhou [ 19/Jan/21 ] |
|
Thank you for your detailed description and attaching all of the necessary files! It really helped expedite the investigation. We believe that a lock was acquired for logging, and encountered an issue that caused logging to stop. We end up handling the resulting signal and try to recursively log the issue. However, we believe that the logging mechanism attempts to acquire that same lock, causing it to hang. I'll be passing this along to the security team for further investigation. Best, |
| Comment by Ashish Madeti [ 16/Jan/21 ] |
|
I failed to mention in my initial description that I recently upgraded this cluster from MongoDB 3.6 to MongoDB 4.4 (via 4.0 and 4.2). And the issue has started happening after that only. I am running the hidden secondary on a Digital Ocean droplet with 12 vCPUs and 48 GB RAM. |