-
Type: Bug
-
Resolution: Gone away
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Index Maintenance
-
None
-
ALL
-
Execution Team 2020-06-01
-
22
This problem came up in a build failure that resulted in a test timeout on the primary node.
- [T1] We have an in-progress two-phase index build.
- [T2] User runs the command to abort the in-progress index build.
- [T1] Index build is ready to vote for committing.
- [T1] Runs the vote command locally via the DBDirectClient [T3].
- [T2] Performs the abort and waits until the index builder thread [T1] receives the signal. Continues holding the exclusive collection lock.
- [T4] applyOps command is run that requires the exclusive global lock. The request gets enqueued as [T2] is holding the intent global lock.
- [T3] Tries to get the index build entry in the config.system.indexBuilds collection. Requires the intent global lock, its request gets enqueued behind [T4]'s global lock request.
Given this we basically have the following deadlock presenting itself:
- [T2] holds the Global IX, Database IX, Collection X locks and is waiting for [T1] to finish so that it can complete aborting the index build.
- [T1] is waiting for [T3] to finish voting to commit the index build.
- [T4] has a lock request enqueued and is waiting for it (~1.989hrs when the test timed out, Global X). [T2] is preventing this acquisition from going through as it continues to hold its locks.
- [T3] has a lock request enqueued behind [T4]'s request and is waiting for it (~1.989hrs when the test timed out, Global IX).
- related to
-
SERVER-48235 The primary node should use the AsyncDBClient to vote for committing the index build to allow the request to be interrupted by the IndexBuildsCoordinator
- Closed