[SERVER-44722] 3 way deadlock can happen between hybrid index build, prepared transactions and stepdown thread on primary that runs index build via coordinator. Created: 18/Nov/19 Updated: 19/Jul/23 Resolved: 17/Apr/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Storage |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Suganthi Mani | Assignee: | Louis Williams |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Steps To Reproduce: |
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Sprint: | Execution Team 2020-05-04 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Participants: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
_buildIndex() is the method which performs collection scan , drain and commit phases of the index build. Drain and commit takes the stronger mode locks ( collection lock in S & X respectively). On master branch, we always run _buildIndex() method using index build coordinator. This means, we would be running _buildIndex() on a spawned thread (internal/system operation) which are not currently killable by the state transition thread (step down thread). This can result in 3 way deadlock where, 1) IndexBuildsCoordinatorMongod-X (internal thread) blocked on prepare conflict while holding RSTL in IX. To be noted, step down thread marks the the main thread(user connection thread which performs "createIndexes" cmd) as killed because the main thread previously acquired the RSTL in IX mode. Usually when the main thread gets interrupted by state transition, it kills the spawned IndexBuildsCoordinatorMongod-X thread NOT via opCtx channel. So, no way the internal thread (i..e.)IndexBuildsCoordinatorMongod-X waiting for the lock could be interrupted. It seems, even on mongoDB 4.2, we will hit the 3 way deadlock if we set this server startup parameter enableIndexBuildsCoordinatorForCreateIndexesCommand to true. Because when "enableIndexBuildsCoordinatorForCreateIndexesCommand" is false, we run drain and commit index build phase on the main thread (user connection thread which performs "createIndexes" cmd) which is always interruptible by the step down thread. Notes: We are acquiring collection lock in stronger mode in order to commit / abort.(X) and drain the side table writes (S). As, a result, this can lead to deadlocks involving prepared transactions, stepdown and indexBuildsCoordinator. |
| Comments |
| Comment by Louis Williams [ 17/Apr/20 ] |
|
This is fixed by |
| Comment by Louis Williams [ 22/Nov/19 ] |
|
This is really only possible when using a 4.3 binary and two-phase index builds are disabled. This code, while present in 4.2, is not exercised. I filed |
| Comment by Louis Williams [ 18/Nov/19 ] |
|
On stepdown, for single-phase builds, we call abortIndexBuildByBuildUUID which does nothing more than set a flag on the MultiIndexBlock. We may need to reconsider how aborting an index build operates, and interrupt through the OperationContext instead. At the moment, the build thread only checks if it has been aborted at a few points in the index build process, and definitely not while acquiring locks. |
| Comment by Suganthi Mani [ 18/Nov/19 ] |
|
This bug was caught during |