[SERVER-53415] Intent Lock timeout lead to server crash Created: 17/Dec/20 Updated: 21/Jan/21 Resolved: 14/Jan/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 4.4.1 |
| Fix Version/s: | 4.9.0, 4.4.4 |
| Type: | Question | Priority: | Major - P3 |
| Reporter: | Yan Zhou | Assignee: | Dianna Hohensee (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Sprint: | Execution Team 2021-01-25 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
Our MongoDB server had a few instances where it crashed after seeing the following log message,
|
| Comments |
| Comment by Dianna Hohensee (Inactive) [ 14/Jan/21 ] |
|
|
| Comment by Dianna Hohensee (Inactive) [ 13/Jan/21 ] |
|
I see that in v4.4 sharding takes a MODE_IS ResourceMutex in an onCommit handler here – and registered here. In master, unlike v4.4, there's an UninterruptibleLockGuard on the same code. That was put in by |
| Comment by Bruce Lucas (Inactive) [ 17/Dec/20 ] |
|
Thanks yan.zhou@cubistsystematic.com, we will investigate. |
| Comment by Yan Zhou [ 17/Dec/20 ] |
|
I have attached the tail of the logs, starting from where it still looks normal (transaction successful etc). I notice that it appears to happen around when chunk migration happens at the same time when a transaction is started. I understand that bulk data insertion without disable the balancer is not optimal, and can impact performance. But I can't just disable balancer for every short burst of write activities. And the worst thing that could happen I expected to be either time out errors or temporary slow performance. Just FYI, mongodb is deployed via docker, |
| Comment by Bruce Lucas (Inactive) [ 17/Dec/20 ] |
|
yan.zhou@cubistsystematic.com, can you please attach a complete log file showing such a crash to this ticket? Alternatively, you can upload it to this secure private portal if it's too large to attach or if it contains sensitive information that you can't share on this public ticket. In addition, please archive and attach the contents of $dbpath/diagnostic.data from a node that has experienced this issue recently (past few days to a week or so), together with the log file(s) for that node. |