[SERVER-44722] 3 way deadlock can happen between hybrid index build, prepared transactions and stepdown thread on primary that runs index build via coordinator. Created: 18/Nov/19  Updated: 19/Jul/23  Resolved: 17/Apr/20

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Suganthi Mani Assignee: Louis Williams
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-44791 Abort index builds by interrupting th... Closed
is depended on by SERVER-43216 Invariant internal operations that ac... Closed
Duplicate
duplicates SERVER-46989 Index builds should hold RSTL to prev... Closed
Related
related to SERVER-42621 3 way deadlock can happen between hyb... Closed
related to SERVER-78662 Deadlock with index build, step down,... Closed
is related to SERVER-46704 Two phase index build can violate loc... Closed
is related to SERVER-71191 Deadlock between index build setup, p... Closed
is related to SERVER-71198 Assert that unkillable operations tha... Backlog
Operating System: ALL
Steps To Reproduce:

load("jstests/libs/check_log.js");
load("jstests/replsets/rslib.js");
load("jstests/core/txns/libs/prepare_helpers.js");
 
(function() {
 
"use strict";
 
const dbName = "test";
const collName = "coll";
 
const rst = new ReplSetTest({nodes: 1});
rst.startSet();
rst.initiate();
 
const primary = rst.getPrimary();
const primaryDB = primary.getDB(dbName);
const primaryColl = primaryDB[collName];
const collNss = primaryColl.getFullName();
 
TestData.dbName = dbName;
TestData.collName = collName;
 
jsTestLog("Do a document write");
assert.commandWorked(primaryColl.insert({_id: 1, x: 1}, {"writeConcern": {"w": 1}}));
 
// Clear the log.
assert.commandWorked(primary.adminCommand({clearLog: 'global'}));
 
// Enable fail point which makes hybrid index build to hang.
assert.commandWorked(primary.adminCommand(
    {configureFailPoint: "hangAfterIndexBuildDumpsInsertsFromBulk", mode: "alwaysOn"}));
 
const indexThread = startParallelShell(() => {
    jsTestLog("Create index");
    const primaryDB = db.getSiblingDB(TestData.dbName);
    assert.commandFailedWithCode(primaryDB[TestData.collName].createIndex({"x": 1}),
                                 ErrorCodes.InterruptedDueToReplStateChange);
}, primary.port);
 
// Wait for hangAfterIndexBuildDumpsInsertsFromBulk fail point to reach.
checkLog.contains(primary, "Hanging after dumping inserts from bulk builder");
 
jsTestLog("Start txn");
const session = primary.startSession();
const sessionDB = session.getDatabase(dbName);
const sessionColl = sessionDB.getCollection(collName);
session.startTransaction();
assert.commandWorked(sessionColl.insert({x: 1}, {$set: {y: 1}}));
 
jsTestLog("Prepare txn");
const prepareTimestamp = PrepareHelpers.prepareTransaction(session);
 
assert.commandWorked(primary.adminCommand(
    {configureFailPoint: "hangAfterIndexBuildDumpsInsertsFromBulk", mode: "off"}));
 
const stepDownThread = startParallelShell(() => {
    jsTestLog("Make primary to step down");
    assert.commandWorked(db.adminCommand({"replSetStepDown": 60 * 60, "force": true}));
}, primary.port);
 
// Wait for threads to join.
indexThread();
stepDownThread();
 
waitForState(primary, ReplSetTest.State.SECONDARY);
// Allow the primary to be re-elected, and wait for it.
assert.commandWorked(primary.adminCommand({replSetFreeze: 0}));
rst.getPrimary();
 
jsTestLog("Abort txn");
assert.commandWorked(session.abortTransaction_forTesting());
 
rst.stopSet();
})();

Sprint: Execution Team 2020-05-04
Participants:

 Description   

_buildIndex() is the method which performs collection scan , drain and commit phases of the index build. Drain and commit takes the stronger mode locks ( collection lock in S & X respectively). On master branch, we always run _buildIndex() method using index build coordinator. This means, we would be running _buildIndex() on a spawned thread (internal/system operation) which are not currently killable by the state transition thread (step down thread). This can result in 3 way deadlock where,

1) IndexBuildsCoordinatorMongod-X (internal thread) blocked on prepare conflict while holding RSTL in IX.
2) Step down enqueues RSTL lock in X mode. And blocked behind IndexBuildsCoordinatorMongod-X thread.
3) CommitTransaction cmd is waiting for RSTL lock to acquire in IX mode but blocked behind the step down thread.

To be noted, step down thread marks the the main thread(user connection thread which performs "createIndexes" cmd) as killed because the main thread previously acquired the RSTL in IX mode. Usually when the main thread gets interrupted by state transition, it kills the spawned IndexBuildsCoordinatorMongod-X thread NOT via opCtx channel. So, no way the internal thread (i..e.)IndexBuildsCoordinatorMongod-X waiting for the lock could be interrupted.

It seems, even on mongoDB 4.2, we will hit the 3 way deadlock if we set this server startup parameter enableIndexBuildsCoordinatorForCreateIndexesCommand to true. Because when "enableIndexBuildsCoordinatorForCreateIndexesCommand" is false, we run drain and commit index build phase on the main thread (user connection thread which performs "createIndexes" cmd) which is always interruptible by the step down thread.

Notes: We are acquiring collection lock in stronger mode in order to commit / abort.(X) and drain the side table writes (S). As, a result, this can lead to deadlocks involving prepared transactions, stepdown and indexBuildsCoordinator.



 Comments   
Comment by Louis Williams [ 17/Apr/20 ]

This is fixed by SERVER-46989.

Comment by Louis Williams [ 22/Nov/19 ]

This is really only possible when using a 4.3 binary and two-phase index builds are disabled. This code, while present in 4.2, is not exercised.

I filed SERVER-44791 to allow aborting index builds through the OperationContext mechanism so that builds blocked on lock acquisitions can be killed.

Comment by Louis Williams [ 18/Nov/19 ]

On stepdown, for single-phase builds, we call abortIndexBuildByBuildUUID which does nothing more than set a flag on the MultiIndexBlock. We may need to reconsider how aborting an index build operates, and interrupt through the OperationContext instead. At the moment, the build thread only checks if it has been aborted at a few points in the index build process, and definitely not while acquiring locks.

Comment by Suganthi Mani [ 18/Nov/19 ]

This bug was caught during SERVER-43216.

Generated at Thu Feb 08 05:06:46 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.