-
Type:
Bug
-
Status: Closed
-
Priority:
Major - P3
-
Resolution: Fixed
-
Affects Version/s: None
-
Fix Version/s: 3.5.10
-
Component/s: Replication
-
Labels:None
-
Backwards Compatibility:Fully Compatible
-
Operating System:ALL
-
Backport Requested:v3.4
-
Steps To Reproduce:
-
Sprint:Repl 2017-05-08, Repl 2017-05-29, Repl 2017-07-10
-
Case:
-
Linked BF Score:0
It is possible that while the "rsBackgroundSync" thread is changing the member state to ROLLBACK for a thread running work on the ReplicationExecutor to need to acquire a lock. This design of holding a LockManager lock while waiting on a condition variable outside of the lock hierarchy seems prone to deadlock. For example, in the GDB output below, thread #39 is holding the Global lock in MODE_X and waiting for its task to set the follower mode to MemberState::RS_ROLLBACK in the ReplicationExecutor. The ReplicationExecutor is currently processing a vote response in thread #13 which waiting for the storage engine to make it durable. The durability thread (#6) is waiting to acquire the MMAPv1 flush lock, which is implicitly held by thread #39 as part of acquiring the global lock.
Thread 39 (Thread 0x7fc1e03f0700 (LWP 20506)):
|
#0 0x00007fc27f0f5404 in pthread_cond_wait@@GLIBC_2.3.2 () from target:/lib/x86_64-linux-gnu/libpthread.so.0
|
#1 0x00007fc2826fba7c in std::condition_variable::wait(std::unique_lock<std::mutex>&) ()
|
#2 0x00007fc28170ea8b in mongo::repl::ReplicationExecutor::Event::waitUntilSignaled() ()
|
#3 0x00007fc2816f0e7d in mongo::repl::ReplicationCoordinatorImpl::setFollowerMode(mongo::repl::MemberState const&) ()
|
#4 0x00007fc281735ef8 in mongo::repl::rollback(mongo::OperationContext*, mongo::repl::OplogInterface const&, mongo::repl::RollbackSource const&, int, mongo::repl::ReplicationCoordinator*, mongo::repl::StorageInterface*, std::function<void (int)>) ()
|
#5 0x00007fc2816037c2 in mongo::repl::BackgroundSync::_runRollback(mongo::OperationContext*, mongo::Status const&, mongo::HostAndPort const&, int, mongo::repl::StorageInterface*) ()
|
#6 0x00007fc281605b0e in mongo::repl::BackgroundSync::_produce(mongo::OperationContext*) ()
|
#7 0x00007fc28160661a in mongo::repl::BackgroundSync::_runProducer() ()
|
#8 0x00007fc28160679a in mongo::repl::BackgroundSync::_run() ()
|
#9 0x00007fc2826fe690 in execute_native_thread_routine ()
|
#10 0x00007fc27f0f1184 in start_thread () from target:/lib/x86_64-linux-gnu/libpthread.so.0
|
#11 0x00007fc27ee1ebed in clone () from target:/lib/x86_64-linux-gnu/libc.so.6
|
...
|
Thread 13 (Thread 0x7fc1ed615700 (LWP 20473)):
|
#0 0x00007fc27f0f5404 in pthread_cond_wait@@GLIBC_2.3.2 () from target:/lib/x86_64-linux-gnu/libpthread.so.0
|
#1 0x00007fc2826fba7c in std::condition_variable::wait(std::unique_lock<std::mutex>&) ()
|
#2 0x00007fc2818d2cab in mongo::CommitNotifier::awaitBeyondNow() ()
|
#3 0x00007fc2818d6a40 in mongo::dur::(anonymous namespace)::DurableImpl::waitUntilDurable() ()
|
#4 0x00007fc2816d57e0 in mongo::repl::ReplicationCoordinatorExternalStateImpl::storeLocalLastVoteDocument(mongo::OperationContext*, mongo::repl::LastVote const&) ()
|
#5 0x00007fc2816ff04b in mongo::repl::ReplicationCoordinatorImpl::_writeLastVoteForMyElection(mongo::repl::LastVote, mongo::executor::TaskExecutor::CallbackArgs const&) ()
|
#6 0x00007fc28170f840 in mongo::repl::ReplicationExecutor::_doOperation(mongo::OperationContext*, mongo::Status const&, mongo::executor::TaskExecutor::CallbackHandle const&, std::__cxx11::list<mongo::repl::ReplicationExecutor::WorkItem, std::allocator<mongo::repl::ReplicationExecutor::WorkItem> >*, std::mutex*) ()
|
#7 0x00007fc28170e0ed in mongo::repl::(anonymous namespace)::callNoExcept(std::function<void ()> const&) ()
|
#8 0x00007fc281715a30 in std::_Function_handler<mongo::repl::TaskRunner::NextAction (mongo::OperationContext*, mongo::Status const&), mongo::repl::ReplicationExecutor::scheduleDBWork(std::function<void (mongo::executor::TaskExecutor::CallbackArgs const&)> const&, mongo::NamespaceString const&, mongo::LockMode)::{lambda(mongo::OperationContext*, mongo::Status const&)#1}>::_M_invoke(std::_Any_data const&, mongo::OperationContext*&&, mongo::Status const&) ()
|
#9 0x00007fc28175d349 in mongo::repl::(anonymous namespace)::runSingleTask(std::function<mongo::repl::TaskRunner::NextAction (mongo::OperationContext*, mongo::Status const&)> const&, mongo::OperationContext*, mongo::Status const&) [clone .constprop.72] ()
|
#10 0x00007fc28175e46f in mongo::repl::TaskRunner::_runTasks() ()
|
#11 0x00007fc281bf38ec in mongo::ThreadPool::_doOneTask(std::unique_lock<std::mutex>*) ()
|
#12 0x00007fc281bf439c in mongo::ThreadPool::_consumeTasks() ()
|
#13 0x00007fc281bf4d56 in mongo::ThreadPool::_workerThreadBody(mongo::ThreadPool*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
|
#14 0x00007fc2826fe690 in execute_native_thread_routine ()
|
#15 0x00007fc27f0f1184 in start_thread () from target:/lib/x86_64-linux-gnu/libpthread.so.0
|
#16 0x00007fc27ee1ebed in clone () from target:/lib/x86_64-linux-gnu/libc.so.6
|
...
|
Thread 6 (Thread 0x7fc27cd1c700 (LWP 20466)):
|
#0 0x00007fc27f0f57be in pthread_cond_timedwait@@GLIBC_2.3.2 () from target:/lib/x86_64-linux-gnu/libpthread.so.0
|
#1 0x00007fc281225fb8 in mongo::CondVarLockGrantNotification::wait(unsigned int) ()
|
#2 0x00007fc28122a6be in mongo::LockerImpl<true>::lockComplete(mongo::ResourceId, mongo::LockMode, unsigned int, bool) ()
|
#3 0x00007fc2812261d6 in mongo::AutoAcquireFlushLockForMMAPV1Commit::AutoAcquireFlushLockForMMAPV1Commit(mongo::Locker*) ()
|
#4 0x00007fc2818d7f1f in mongo::dur::durThread(mongo::ClockSource*, long) ()
|
#5 0x00007fc2826fe690 in execute_native_thread_routine ()
|
#6 0x00007fc27f0f1184 in start_thread () from target:/lib/x86_64-linux-gnu/libpthread.so.0
|
#7 0x00007fc27ee1ebed in clone () from target:/lib/x86_64-linux-gnu/libc.so.6
|
Thank you to Benety Goh for helping me with the GDB output.
- is related to
-
SERVER-27154 replSetRequestVotes command should wait for durability
-
- Closed
-
-
SERVER-23908 MMAPv1 DurableImpl::waitUntilDurable should yield the flush lock
-
- Closed
-
-
SERVER-27282 Clean up and fix bugs in RS rollback error handling
-
- Closed
-