[SERVER-62951] Deadlock on restarting the node with a prepared transaction acquiring the global lock Created: 25/Jan/22  Updated: 06/Dec/22

Status: Open
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Moustafa Maher Assignee: Backlog - Replication Team
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File deadlock.patch    
Issue Links:
Depends
Problem/Incident
is caused by SERVER-56644 local.system.resharding.slimOplogForG... Closed
Related
related to SERVER-67605 Make retryable_findAndModify_commit_a... Closed
Assigned Teams:
Replication
Operating System: ALL
Participants:
Linked BF Score: 12

 Description   

Acquiring the global lock in the initAndListen function while initialization mongod after starting up replication is dangerous as we might reconstruct a prepared transaction which acquire the global lock and keep it, which will cause a deadlock.

 

 

You can repro it by adding a sleep before this line and run jstests/replsets/ddl_op_behind_transaction_fails_in_shutdown.js



 Comments   
Comment by Max Hirschhorn [ 03/Feb/22 ]

I confirmed the hang only happens with {nodes: 1}. I attempted both with a single-voting node replica set and with a 3-voting node replica set.

Sending this over to the Replication team because it doesn't appear this hang can happen in a production setting. I feel like it is surprising a mongod can transition to member state PRIMARY while _initAndListen() hasn't even started accepting incoming connections.

diff --git a/jstests/replsets/ddl_op_behind_transaction_fails_in_shutdown.js b/jstests/replsets/ddl_op_behind_transaction_fails_in_shutdown.js
index e682134cad2..8806ea62e65 100644
--- a/jstests/replsets/ddl_op_behind_transaction_fails_in_shutdown.js
+++ b/jstests/replsets/ddl_op_behind_transaction_fails_in_shutdown.js
@@ -22,7 +22,8 @@ load("jstests/core/txns/libs/prepare_helpers.js");
 load("jstests/libs/parallel_shell_helpers.js");
 load('jstests/libs/test_background_ops.js');
 
-const rst = new ReplSetTest({nodes: 1});
+const rst = new ReplSetTest(
+    {nodes: [{}, {rsConfig: {votes: 0, priority: 0}}, {rsConfig: {votes: 0, priority: 0}}]});
 rst.startSet();
 rst.initiate();
 
@@ -114,5 +115,7 @@ assert.commandFailedWithCode(primary.getDB(dbName).runCommand({
                              ErrorCodes.MaxTimeMSExpired);
 
 // Skip validation because it requires a lock that the prepared transaction is blocking.
-rst.stopSet(true /*use default exit signal*/, false /*forRestart*/, {skipValidation: true});
+rst.stopSet(true /*use default exit signal*/,
+            false /*forRestart*/,
+            {skipValidation: true, skipCheckDBHashes: true});
 })();
diff --git a/src/mongo/db/mongod_main.cpp b/src/mongo/db/mongod_main.cpp
index fea45167926..ab5e01268b9 100644
--- a/src/mongo/db/mongod_main.cpp
+++ b/src/mongo/db/mongod_main.cpp
@@ -736,6 +736,8 @@ ExitCode _initAndListen(ServiceContext* serviceContext, int listenPort) {
         }
 
         if (replSettings.usingReplSets()) {
+            sleepsecs(1);
+            LOGV2(0, "About to create view on donor's oplog");
             Lock::GlobalWrite lk(startupOpCtx.get());
             OldClientContext ctx(startupOpCtx.get(), NamespaceString::kRsOplogNamespace.ns());
             tenant_migration_util::createOplogViewForTenantMigrations(startupOpCtx.get(), ctx.db());

Comment by Moustafa Maher [ 02/Feb/22 ]

daniel.gottlieb The analysis of max.hirschhorn is correct!

Comment by Max Hirschhorn [ 31/Jan/22 ]

Are we preparing transactions in a background thread at startup?

daniel.gottlieb, it looks like the transactions are put into the prepared state synchronously at startup by ReplicationCoordinatorImpl::_startLoadLocalConfig(). However, locks for transactions are released rather than stashed when applying a transaction as a secondary (SERVER-37199). This means in the absence of other threads, the global X lock can be immediately acquired.

What adding the sleep enables to happen is the node transitions to member state PRIMARY and reacquires locks for the prepared transaction.

I'm not really sure where the best place to move creating the oplog view for tenant migrations is. Would we want to do it before starting up the ReplicationCoordinatorImpl?

Comment by Daniel Gottlieb (Inactive) [ 25/Jan/22 ]

m.maher, can you comment why the sleep matters? Are we preparing transactions in a background thread at startup? I feel deadlocks at startup should be deterministic.

Generated at Thu Feb 08 05:56:30 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.