[SERVER-48388] MongoDB processes get hung up when trying to acquire lock Created: 22/May/20 Updated: 07/Jul/20 Resolved: 07/Jul/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 3.6.2 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Raghu c | Assignee: | Dmitry Agranat |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Operating System: | ALL | ||||||||
| Participants: | |||||||||
| Description |
|
We have a three node MongoDB replica set deployed in our Prod environment. The primary mongod process gets hung up after running for 12 hours or so. We are able to see too many threads (around 15,000) stuck in the same stack,
Attaching the pstack and diagnostic metrics. Due to the sensitive nature of the db.logs they cannot be shared. The db logs had statements that showed around 14484 connections were open.
|
| Comments |
| Comment by Dmitry Agranat [ 07/Jul/20 ] |
|
Glad to hear the issue did not occur on MongoDB 4.0.10. I will go ahead and close this case. Regards, |
| Comment by Raghu c [ 03/Jul/20 ] |
|
Hi dmitry.agranat, I got a chance to test it out in 4.0.10 and the same issue did not occur. Also the same issue did not occur in 3.6.2 when using the MMAPv1 Storage Engine. |
| Comment by Dmitry Agranat [ 30/Jun/20 ] |
|
Did you have a chance testing this on a MongoDB version > 3.6.4 or later? If you did, did this issue occurred again? Thanks, |
| Comment by Raghu c [ 27/May/20 ] |
|
Hi dmitry.agranat, Thank you so much for the quick reply. I will update this thread after testing with one of the latest MongoDB versions. Can you please help me to understand what went wrong so that we can include this as part of our stress testing? The issue you've linked will occur only if more than 64K cursors are opened simultaneously on a data source. I'm fairly positive that our application does not open so many cursors in so little time. Also something I've missed to update is that even secondaries become unresponsive and we are not able to connect to any of the running instances. Also what is a WiredTiger Table? Is it an in-memory data structure that holds all the data before persistence to the disk ? Thanks in advance. |
| Comment by Dmitry Agranat [ 26/May/20 ] |
|
Hi raghu.9208@gmail.com, thank you for the report. Based on the stack trace and your current MongoDB version (3.6.2), this might be related to Thanks, |