[SERVER-69861] Uninterruptible lock guard in election causes FCBIS to hang Created: 21/Sep/22  Updated: 29/Oct/23  Resolved: 04/Oct/22

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 6.0.3, 6.2.0-rc0

Type: Bug Priority: Major - P3
Reporter: Matthew Russotto Assignee: Matthew Russotto
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v6.1, v6.0
Sprint: Repl 2022-10-03, Repl 2022-10-17
Participants:

 Description   

In order to avoid interrupting ourselves due to our own stepdown, we use uninterruptible locks when writing lastVote in an election. Unfortunately if we've taken the global lock in X mode for FCBIS storage change, this leads to a deadlock – we're trying to acquire a write lock on an uninterruptible opCtx while also attempting to kill the opCtx so we can change storage and release the lock.

The quick fix might be to not acquire the uninterruptible lock guard during STARTUP2, but we should definitely add a test for this; since initial sync nodes are usually non-voting we don't have coverage for an election during it.



 Comments   
Comment by Githook User [ 28/Oct/22 ]

Author:

{'name': 'Matthew Russotto', 'email': 'matthew.russotto@mongodb.com', 'username': 'mtrussotto'}

Message: SERVER-69861 Uninterruptible lock guard in election causes FCBIS to hang
Support for reading last vote in data_replicator_external_state

(cherry picked from commit 810d5c1f2b0f8d3767df55812c3324d6171aa107)
Branch: v6.0
https://github.com/mongodb/mongo/commit/6c1b9191fd8fc814aae17b0c99785983c190f5bf

Comment by Githook User [ 28/Oct/22 ]

Author:

{'name': 'Matthew Russotto', 'email': 'matthew.russotto@mongodb.com', 'username': 'mtrussotto'}

Message: SERVER-69861 Uninterruptible lock guard in election causes FCBIS to hang

Avoid an unusual scenario where if an node were being resynced (not added initially)
with FCBIS, and an election occurred along with a particular sequence of network
splits, and the node was restarted directly after FCBIS completed, the node could
vote twice for different canddiates in the same term.

(cherry picked from commit b12d8de0d90fbec0fc180dfa88ec4f77a586f0a4)
Branch: v6.0
https://github.com/10gen/mongo-enterprise-modules/commit/4b242224845c6bc89c47adead1f7784051e80762

Comment by Githook User [ 04/Oct/22 ]

Author:

{'name': 'Matthew Russotto', 'email': 'matthew.russotto@mongodb.com', 'username': 'mtrussotto'}

Message: SERVER-69861 Uninterruptible lock guard in election causes FCBIS to hang
Support for reading last vote in data_replicator_external_state
Branch: master
https://github.com/mongodb/mongo/commit/810d5c1f2b0f8d3767df55812c3324d6171aa107

Comment by Githook User [ 04/Oct/22 ]

Author:

{'name': 'Matthew Russotto', 'email': 'matthew.russotto@mongodb.com', 'username': 'mtrussotto'}

Message: SERVER-69861 Uninterruptible lock guard in election causes FCBIS to hang

Avoid an unusual scenario where if an node were being resynced (not added initially)
with FCBIS, and an election occurred along with a particular sequence of network
splits, and the node was restarted directly after FCBIS completed, the node could
vote twice for different canddiates in the same term.
Branch: master
https://github.com/10gen/mongo-enterprise-modules/commit/b12d8de0d90fbec0fc180dfa88ec4f77a586f0a4

Generated at Thu Feb 08 06:14:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.