Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 4.9.0, 4.2.14, 4.4.6
Affects Version/s: None
Component/s: Sharding
Labels:
- sharding-csrs-stepdown-also
- sharding-wfbf-day

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v4.4, v4.2
Sprint:
Sharding 2020-11-30, Sharding 2020-12-14, Sharding 2020-12-28, Sharding 2021-01-11, Sharding 2021-01-25, Sharding 2021-02-08
Case:
Linked BF Score:
18
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

There is a deadlock between the thread that is running the process of stepping down and the session catalog migration producer. More concretely:
1. The thread that is running the invalidateSessionsForStepdown is holding a lock (RSTL lock) and is sitting on a condition variable waiting to check out session.
2. The session catalog migration thread is blocked here , waiting to get the lock held by [1] but it will never get it because this thread is also the one that should check out the session and notify [1].

The thread holding the RSTL lock on version 4.4 might have a stacktrace like the following:

#0  0x00007f1e44d01c3d in poll () from /lib64/libc.so.6
#1  0x000056130ba24f87 in mongo::transport::TransportLayerASIO::BatonASIO::run(mongo::ClockSource*) ()
#2  0x000056130ba0623d in mongo::transport::TransportLayerASIO::BatonASIO::run_until(mongo::ClockSource*, mongo::Date_t) ()
#3  0x000056130bef5821 in mongo::ClockSource::waitForConditionUntil(mongo::stdx::condition_variable&, mongo::BasicLockableAdapter, mongo::Date_t, mongo::Waitable*) ()
#4  0x000056130beeacd0 in mongo::OperationContext::waitForConditionOrInterruptNoAssertUntil(mongo::stdx::condition_variable&, mongo::BasicLockableAdapter, mongo::Date_t) ()
#5  0x000056130bea0795 in _ZZN5mongo13Interruptible32waitForConditionOrInterruptUntilISt11unique_lockINS_12latch_detail5LatchEEZNS_28CondVarLockGrantNotification4waitEPNS_16OperationContextENS_8DurationISt5ratioILl1ELl1000EEEEEUlvE_EEbRNS_4stdx18condition_variableERT_NS_6Date_tET0_PNS_10AtomicWordIlEEENKUlSJ_NS0_9WakeSpeedEE1_clESJ_SO_ ()
#6  0x000056130bea0daf in mongo::CondVarLockGrantNotification::wait(mongo::OperationContext*, mongo::Duration<std::ratio<1l, 1000l> >) ()
#7  0x000056130bea29c6 in mongo::LockerImpl::_lockComplete(mongo::OperationContext*, mongo::ResourceId, mongo::LockMode, mongo::Date_t) ()
#8  0x000056130beab773 in mongo::repl::ReplicationStateTransitionLockGuard::waitForLockUntil(mongo::Date_t) ()
#9  0x000056130a3269f7 in mongo::repl::ReplicationCoordinatorImpl::AutoGetRstlForStepUpStepDown::AutoGetRstlForStepUpStepDown(mongo::repl::ReplicationCoordinatorImpl*, mongo::OperationContext*, mongo::repl::ReplicationCoordinator::OpsKillingStateTransitionEnum, mongo::Date_t) ()
#10 0x000056130a34bee9 in mongo::repl::ReplicationCoordinatorImpl::_stepDownFinish(mongo::executor::TaskExecutor::CallbackArgs const&, mongo::executor::TaskExecutor::EventHandle const&) ()
...

The other thread's stacktrace might be different depending on the operation, however, there will be a chunk migration thread on the session migration step (most likely on the SessionCatalogMigrationDestination class).

causes

SERVER-57756 Race between concurrent stepdowns and applying transaction oplog entry

Closed

related to

SERVER-55007 Deadlock between step down and MongoDOperationContextSession

Closed

SERVER-60161 Deadlock between config server stepdown and _configsvrRenameCollectionMetadata command

Closed

SERVER-57167 Prevent throwing on session creation due to stepdown before stepdown completes

Closed

Assignee:: Randolph Tan
Reporter:: Sergi Mateo Bellido
Participants:: Githook User, Randolph Tan, Sergi Mateo Bellido
Votes:: 0 Vote for this issue
Watchers:: 14 Start watching this issue

Created:: Nov 02 2020 06:18:30 PM UTC
Updated:: Oct 29 2023 10:00:58 PM UTC
Resolved:: Feb 04 2021 08:20:31 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates