[SERVER-42621] 3 way deadlock can happen between hybrid index build, prepared transactions and stepdown thread. Created: 03/Aug/19  Updated: 29/Oct/23  Resolved: 28/Aug/19

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.3.1

Type: Bug Priority: Major - P3
Reporter: Suganthi Mani Assignee: Suganthi Mani
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File abort1.js    
Issue Links:
Backports
Related
related to SERVER-42869 IndexBuildInterceptor rollback handle... Closed
related to SERVER-42824 do not lock RSTL for index build cleanup Closed
related to SERVER-44190 Adapt hybrid_index_build_on_state_tra... Closed
related to SERVER-42799 obtain timestamp for cleaning up inde... Closed
related to SERVER-42800 skip size adjustments on temporary re... Closed
related to SERVER-43837 remove RSTL unlocking logic from Mult... Closed
is related to SERVER-44722 3 way deadlock can happen between hyb... Closed
is related to SERVER-46704 Two phase index build can violate loc... Closed
is related to SERVER-71191 Deadlock between index build setup, p... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.2
Steps To Reproduce:

/*
 * 3 way deadlock - Hybrid index builder, prepared txn and step down thread.
 */
load("jstests/libs/check_log.js");
load("jstests/core/txns/libs/prepare_helpers.js");
 
(function() {
 
"use strict";
 
const testName = "hybridIndexBuildRepro";
const dbName = "test";
const collName = "coll";
 
var rst = new ReplSetTest({name: testName, nodes: [{}, {rsConfig: {priority: 0}}]});
rst.startSet();
rst.initiate();
 
var primary = rst.getPrimary();
const primaryDB = primary.getDB(dbName);
const primaryAdmin = primary.getDB("admin");
const primaryColl = primaryDB[collName];
const collNss = primaryColl.getFullName();
 
TestData.dbName = dbName;
TestData.collName = collName;
 
jsTestLog("1. Do a document write");
assert.writeOK(
        primaryColl.insert({_id: 1, x:1}, {"writeConcern": {"w": "majority"}}));
rst.awaitReplication();
 
// Clear the log.
assert.commandWorked(primaryAdmin.runCommand({clearLog: 'global'}));
 
// Enable failpoint which makes hybrid index build build to hang.
assert.commandWorked(primaryAdmin.runCommand(
        {configureFailPoint: "hangAfterIndexBuildDumpsInsertsFromBulk", mode: "alwaysOn"}));
 
const indexThread = startParallelShell(() => {
    jsTestLog("Create index");
    var primaryDB = db.getSiblingDB(TestData.dbName);
    assert.commandFailed(primaryDB[TestData.collName].createIndex({"x": 1}));
}, primary.port);
 
rst.awaitReplication();
 
// Wait for hangAfterIndexBuildDumpsInsertsFromBulk failpoint to reach.
checkLog.contains(primary, "Hanging after dumping inserts from bulk builder");
 
jsTestLog("Start txn");
let session = primary.startSession({causalConsistency: false});
let sessionDB = session.getDatabase(dbName);
const sessionColl = sessionDB.getCollection(collName);
session.startTransaction();
assert.commandWorked(sessionColl.insert({x: 1}, {$set: {y: 1}}));
 
jsTestLog("Prepare txn");
let prepareTimestamp = PrepareHelpers.prepareTransaction(session);
 
const stepDownThread = startParallelShell(() => {
    jsTestLog("Make primary to step down");
    assert.commandWorked(db.adminCommand({"replSetStepDown": 60*60, "force": true}));
}, primary.port);
 
checkLog.contains(primary, "Starting to kill user operations");
 
assert.commandWorked(primaryAdmin.runCommand(
        {configureFailPoint: "hangAfterIndexBuildDumpsInsertsFromBulk", mode: "off"}));
 
// Wait for threads to join.
indexThread();
stepDownThread();
 
rst.waitForState(primary, ReplSetTest.State.SECONDARY);
// Unfreeze the original primary so that it can stand for election again for the next test.
assert.commandWorked(primary.adminCommand({replSetFreeze: 0}));
// Make the primary to reelect again.
assert.commandWorked(primary.adminCommand({replSetStepUp: 1}));
primary = rst.getPrimary();
 
jsTestLog("commit txn");
assert.commandWorked(PrepareHelpers.commitTransaction(session, prepareTimestamp));
 
 
rst.stopSet();
})();

Sprint: Execution Team 2019-08-12, Repl 2019-08-26, Repl 2019-09-09
Participants:
Linked BF Score: 7

 Description   

Currently, we can see a 3 way deadlock between hybrid index builder, prepared txn and step down thread for the above repro. The problem is that when step down thread kills "createIndex" cmd thread. As part of index teardown step, on primary, MultiIndexBlock::cleanUpAfterBuild is called with RSTL held in IX mode which then tries to acquire X lock on user collection in an uninterruptible lock guard but gets blocked behind prepared transaction due to collection lock conflict. Since createIndex is holding RSTL in IX mode, it blocks step down thread. CommitTransaction cmd waiting to acquire RSTL lock in IX mode gets blocked behind the step down thread as the step down thread has enqueued RSTL lock in X mode.



 Comments   
Comment by Githook User [ 31/Oct/19 ]

Author:

{'username': 'smani87', 'email': 'suganthi.mani@mongodb.com', 'name': 'Suganthi Mani'}

Message: SERVER-42621 Added test that running hybrid index build does not lead to 3 way deadlock.

(cherry picked from commit 3693ad5f9031c59d8a0646337f5c3bb3c818d49b)
Branch: v4.2
https://github.com/mongodb/mongo/commit/cd068c8d8b7c1196ff7985e06248b066f97d7028

Comment by Githook User [ 28/Aug/19 ]

Author:

{'name': 'Suganthi Mani', 'username': 'smani87', 'email': 'suganthi.mani@mongodb.com'}

Message: SERVER-42621 Added test that running hybrid index build does not lead to 3 way deadlock.
Branch: master
https://github.com/mongodb/mongo/commit/3693ad5f9031c59d8a0646337f5c3bb3c818d49b

Comment by Suganthi Mani [ 21/Aug/19 ]

Spoke to benety.goh offline,  and we agreed that this parent ticket will be used to add the testing(repro script) for the 3 way deadlock and will be pushed to master after all the underlying sub-tickets are addressed.

Comment by Benety Goh [ 19/Aug/19 ]

suganthi.mani, thanks for the detailed analysis. I've filed SERVER-42869 for the IndexBuildInterceptor work.

Comment by Suganthi Mani [ 16/Aug/19 ]

benety.goh, found the reason that was causing memory corruption in the patch build. When the transaction gets aborted, we also try to rollback the changes that we did during the txn to IndexBuildInterceptor::_sideWritesCounter. But, the problem is, we are accessing the member of an object instance (i.e.) _indexBuildInterceptor that was dropped/destructed as part of index build cleanup.

Comment by Eric Milkie [ 09/Aug/19 ]

I'm having a think on ways to mitigate the tangle between index build cleanup and prepared transactions that have registered Changes that attempt to access the index build's temporary tables.

Comment by Suganthi Mani [ 08/Aug/19 ]

Currently, there is a problem of dropping RSTL lock by index cleanup step. Though, the index cleanup step doesn't need synchronization with replica state transition but it depends on prepared transaction. I am attaching the abort.js script to this ticket which simulate the below scenario.

1) Prepared txn does some writes to the index builder internal sideWrites table  and performs size adjustment to the sideWrites table by holding user collection lock in IX mode.
2) Step down kills the index build and yields the locks (global to user collection lock) held by the prepared txn.
3) As a result, index build clean up is able to acquire the user collection lock in X mode and able to drop the sideWrites table.
4) Now, if we try to abort txn, we try to revert the data size changes we did on step #1 for sideWrites table. =====> This triggers seg fault, as we are trying to revert the data size changes for the dropped sideWrites table.

Comment by Eric Milkie [ 06/Aug/19 ]

I think the best thing to do here is to not hold the RSTL, since index building doesn't need synchronization with replica state transitions (it is handled specially). Removing the UninterruptibleLockGuard for index build abort isn't possible in the current code (I tried and failed).

Comment by Judah Schvimer [ 05/Aug/19 ]

milkie, I've moved this to the execution team since I expect the solution to be in the indexing code, but I'm happy to help brainstorm other solutions and take it back onto the replication team if the solution lies outside of the indexing code.

Comment by Judah Schvimer [ 05/Aug/19 ]

I feel like the best fix here will be to remove the UninterruptibleLockGuard. milkie, Is that possible?

The alternative I can imagine (though haven't fully thought through) is for all of index building on the primary, after writing the oplog entry, to not hold the RSTL. We could drop it like we do for prepared transactions as soon as we no longer need the node to be a primary. We made a similar fix to index building on secondaries.

Comment by Suganthi Mani [ 05/Aug/19 ]

This is a problem even on 4.2 as MultiIndexBlock::cleanUpAfterBuild() tries to acquire X lock for user collection with RSTL held in IX mode in an UninterruptibleLockGuard.
CC milkie

Generated at Thu Feb 08 05:00:58 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.