Details
-
Bug
-
Resolution: Fixed
-
Major - P3
-
None
-
None
-
None
-
Fully Compatible
-
ALL
-
42
Description
In order to prevent problems with step down during defragmentation, user cancellation of defragmentation no longer removes the defragmentCollection flag on the collection, but checks whether there is an existing defragmentation state for this collection and then changes the phase to kSplitChunks. This check requires a lock acquisition. If the test has completed phase 1 when defragmentation is cancelled, the balancer thread will be waiting at the failpoint while holding the mutex, causing a deadlock with the cancellation.
This can be solved by causing the failpoint to not pause, but make transition phases a no-op. Then the deadlock will be solved, and each balancer round will try to transition phases again until the failpoint is cleared.