Details
-
Bug
-
Status: Closed
-
Major - P3
-
Resolution: Fixed
-
None
-
Fully Compatible
-
ALL
-
v4.4, v4.2
-
Sharding 2020-11-30, Sharding 2020-12-14, Sharding 2020-12-28, Sharding 2021-01-11, Sharding 2021-01-25, Sharding 2021-02-08
-
(copied to CRM)
-
18
Description
There is a deadlock between the thread that is running the process of stepping down and the session catalog migration producer. More concretely:
1. The thread that is running the invalidateSessionsForStepdown is holding a lock (RSTL lock) and is sitting on a condition variable waiting to check out session.
2. The session catalog migration thread is blocked here , waiting to get the lock held by [1] but it will never get it because this thread is also the one that should check out the session and notify [1].
The thread holding the RSTL lock on version 4.4 might have a stacktrace like the following:
#0 0x00007f1e44d01c3d in poll () from /lib64/libc.so.6
|
#1 0x000056130ba24f87 in mongo::transport::TransportLayerASIO::BatonASIO::run(mongo::ClockSource*) ()
|
#2 0x000056130ba0623d in mongo::transport::TransportLayerASIO::BatonASIO::run_until(mongo::ClockSource*, mongo::Date_t) ()
|
#3 0x000056130bef5821 in mongo::ClockSource::waitForConditionUntil(mongo::stdx::condition_variable&, mongo::BasicLockableAdapter, mongo::Date_t, mongo::Waitable*) ()
|
#4 0x000056130beeacd0 in mongo::OperationContext::waitForConditionOrInterruptNoAssertUntil(mongo::stdx::condition_variable&, mongo::BasicLockableAdapter, mongo::Date_t) ()
|
#5 0x000056130bea0795 in _ZZN5mongo13Interruptible32waitForConditionOrInterruptUntilISt11unique_lockINS_12latch_detail5LatchEEZNS_28CondVarLockGrantNotification4waitEPNS_16OperationContextENS_8DurationISt5ratioILl1ELl1000EEEEEUlvE_EEbRNS_4stdx18condition_variableERT_NS_6Date_tET0_PNS_10AtomicWordIlEEENKUlSJ_NS0_9WakeSpeedEE1_clESJ_SO_ ()
|
#6 0x000056130bea0daf in mongo::CondVarLockGrantNotification::wait(mongo::OperationContext*, mongo::Duration<std::ratio<1l, 1000l> >) ()
|
#7 0x000056130bea29c6 in mongo::LockerImpl::_lockComplete(mongo::OperationContext*, mongo::ResourceId, mongo::LockMode, mongo::Date_t) ()
|
#8 0x000056130beab773 in mongo::repl::ReplicationStateTransitionLockGuard::waitForLockUntil(mongo::Date_t) ()
|
#9 0x000056130a3269f7 in mongo::repl::ReplicationCoordinatorImpl::AutoGetRstlForStepUpStepDown::AutoGetRstlForStepUpStepDown(mongo::repl::ReplicationCoordinatorImpl*, mongo::OperationContext*, mongo::repl::ReplicationCoordinator::OpsKillingStateTransitionEnum, mongo::Date_t) ()
|
#10 0x000056130a34bee9 in mongo::repl::ReplicationCoordinatorImpl::_stepDownFinish(mongo::executor::TaskExecutor::CallbackArgs const&, mongo::executor::TaskExecutor::EventHandle const&) ()
|
...
|
The other thread's stacktrace might be different depending on the operation, however, there will be a chunk migration thread on the session migration step (most likely on the SessionCatalogMigrationDestination class).
Attachments
Issue Links
- causes
-
SERVER-57756 Race between concurrent stepdowns and applying transaction oplog entry
-
- Closed
-
- related to
-
SERVER-55007 Deadlock between step down and MongoDOperationContextSession
-
- Closed
-
-
SERVER-60161 Deadlock between config server stepdown and _configsvrRenameCollectionMetadata command
-
- Closed
-
-
SERVER-57167 Prevent throwing on session creation due to stepdown before stepdown completes
-
- Closed
-