[SERVER-63169] Fix deadlock in balancer_defragmentation_merge_chunks Created: 01/Feb/22 Updated: 29/Oct/23 Resolved: 02/Feb/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 5.3.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Allison Easton | Assignee: | Allison Easton |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Backwards Compatibility: | Fully Compatible | ||||
| Operating System: | ALL | ||||
| Participants: | |||||
| Linked BF Score: | 42 | ||||
| Description |
|
In order to prevent problems with step down during defragmentation, user cancellation of defragmentation no longer removes the defragmentCollection flag on the collection, but checks whether there is an existing defragmentation state for this collection and then changes the phase to kSplitChunks. This check requires a lock acquisition. If the test has completed phase 1 when defragmentation is cancelled, the balancer thread will be waiting at the failpoint while holding the mutex, causing a deadlock with the cancellation. This can be solved by causing the failpoint to not pause, but make transition phases a no-op. Then the deadlock will be solved, and each balancer round will try to transition phases again until the failpoint is cleared. |
| Comments |
| Comment by Githook User [ 02/Feb/22 ] |
|
Author: {'name': 'Allison Easton', 'email': 'allison.easton@mongodb.com', 'username': 'allisoneaston'}Message: |