[SERVER-34732] collection drop hangs in test of transactions Created: 27/Apr/18  Updated: 25/May/18  Resolved: 11/May/18

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: 3.7.7
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Jeffrey Yemin Assignee: Dianna Hohensee (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File currentOp.txt    
Issue Links:
Depends
depends on SERVER-34917 Move the expired transaction aborter ... Closed
Related
related to SERVER-35217 killSessions command attempts to kill... Closed
related to SERVER-34795 killSessions should kill transactions... Closed
related to SERVER-34800 The transaction aborter thread should... Closed
Operating System: ALL
Steps To Reproduce:
  1. Start a 3-member replica set
  2. Create a collection
  3. Start a transaction in session 1
  4. Insert one document into the collection using session 1
  5. Drop the collection (using an implicit session, call it 2)
  6. Commit the transaction in session 1 (no concurrency here, this happens only after the drop completes)

Expected result:

The drop command waits for 1 minute, upon which it returns normally and the collection has been dropped. The commitTransaction fails with the error ""Command failed with error 251: 'Transaction 1 has been aborted.'" This happens sometimes, especially as the very first thing I do on a freshly created replica set

Actual result:

The drop command hangs indefinitely. This happens sometimes, frequently in the test run after seeing the expected result.

I've attached the output of db.currentOp during this hang.

Sprint: Storage NYC 2018-05-07, Storage NYC 2018-05-21
Participants:

 Description   

A drop of a collection can hang when there is an open but inactive transaction that has modified a document in that collection.



 Comments   
Comment by Dianna Hohensee (Inactive) [ 11/May/18 ]

jeff.yemin Your repro no longer hangs as of this commit to master.

I'm moving SERVER-34800 to related, as opposed to a dependency. Transactions only take IS and IX locks in v4.0, so I don't think aborting one transaction and clearing its cursors taking IS locks can block on other transactions' locks.

Please reopen this or a new ticket if you find any more issues. Thanks for all your help!

Comment by Dianna Hohensee (Inactive) [ 09/May/18 ]

This is the drop cmd, waiting on an X lock

Thread 4 (Thread 0x7f0fe7297700 (LWP 29598)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x00007f101fd2e615 in __gthread_cond_timedwait (__cond=0x7f0fecf570d0, __mutex=0x7f0fecf570a8, __abs_timeout=0x7f0fe7291950) at /opt/mongodbtoolchain/v2/bin/../lib/gcc/x86_64-mongodb-linux/5.4.0/../../../../include/c++/5.4.0/x86_64-mongodb-linux/bits/gthr-default.h:871
#2  0x00007f101fd39a12 in std::condition_variable::__wait_until_impl<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (this=0x7f0fecf570d0, __lock=..., __atime=...) at /opt/mongodbtoolchain/v2/bin/../lib/gcc/x86_64-mongodb-linux/5.4.0/../../../../include/c++/5.4.0/condition_variable:165
#3  0x00007f101fd39945 in std::condition_variable::wait_until<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (this=0x7f0fecf570d0, __lock=..., __atime=...) at /opt/mongodbtoolchain/v2/bin/../lib/gcc/x86_64-mongodb-linux/5.4.0/../../../../include/c++/5.4.0/condition_variable:105
#4  0x00007f10147825a9 in mongo::ClockSource::waitForConditionUntil (this=0x7f10115afb20, cv=..., m=..., deadline=...) at src/mongo/util/clock_source.cpp:39
#5  0x00007f10149d0dba in mongo::OperationContext::waitForConditionOrInterruptNoAssertUntil(std::condition_variable&, std::unique_lock<std::mutex>&, mongo::Date_t)::$_3::operator()() const (this=0x7f0fe7291c00) at src/mongo/db/operation_context.cpp:313
#6  0x00007f10149d09cb in mongo::OperationContext::waitForConditionOrInterruptNoAssertUntil (this=0x7f0fecffada0, cv=..., m=..., deadline=...) at src/mongo/db/operation_context.cpp:308
#7  0x00007f10149d0ca7 in mongo::OperationContext::waitForConditionOrInterruptUntil (this=0x7f0fecffada0, cv=..., m=..., deadline=...) at src/mongo/db/operation_context.cpp:248
#8  0x00007f1017a82356 in mongo::OperationContext::waitForConditionOrInterruptUntil<mongo::CondVarLockGrantNotification::wait(mongo::OperationContext*, mongo::Duration<std::ratio<1l, 1000l> >)::$_1>(std::condition_variable&, std::unique_lock<std::mutex>&, mongo::Date_t, mongo::CondVarLockGrantNotification::wait(mongo::OperationContext*, mongo::Duration<std::ratio<1l, 1000l> >)::$_1) (this=0x7f0fecffada0, cv=..., m=..., deadline=..., pred=...) at src/mongo/db/operation_context.h:205
#9  0x00007f1017a813a9 in mongo::OperationContext::waitForConditionOrInterruptFor<mongo::CondVarLockGrantNotification::wait(mongo::OperationContext*, mongo::Duration<std::ratio<1l, 1000l> >)::$_1>(std::condition_variable&, std::unique_lock<std::mutex>&, mongo::Duration<std::ratio<1l, 1000l> >, mongo::CondVarLockGrantNotification::wait(mongo::OperationContext*, mongo::Duration<std::ratio<1l, 1000l> >)::$_1) (this=0x7f0fecffada0, cv=..., m=..., duration=..., pred=...) at src/mongo/db/operation_context.h:221
#10 0x00007f1017a80d40 in mongo::CondVarLockGrantNotification::wait (this=0x7f0fecf570a0, opCtx=0x7f0fecffada0, timeout=...) at src/mongo/db/concurrency/lock_state.cpp:230
#11 0x00007f1017a8ac79 in mongo::LockerImpl<false>::lockComplete (this=0x7f0fecf57020, opCtx=0x7f0fecffada0, resId=..., mode=mongo::MODE_X, deadline=..., checkDeadlock=false) at src/mongo/db/concurrency/lock_state.cpp:857
#12 0x00007f1017a8bdb7 in mongo::LockerImpl<false>::lock (this=0x7f0fecf57020, opCtx=0x7f0fecffada0, resId=..., mode=mongo::MODE_X, deadline=..., checkDeadlock=false) at src/mongo/db/concurrency/lock_state.cpp:492
#13 0x00007f1017a66b29 in mongo::Lock::DBLock::DBLock (this=0x7f0fe7292828, opCtx=0x7f0fecffada0, db="test", mode=mongo::MODE_X, deadline=...) at src/mongo/db/concurrency/d_concurrency.cpp:211
#14 0x00007f10197f59b2 in mongo::AutoGetDb::AutoGetDb (this=0x7f0fe7292828, opCtx=0x7f0fecffada0, dbName="test", mode=mongo::MODE_X, deadline=...) at src/mongo/db/catalog_raii.cpp:59
#15 0x00007f101c5c6b6d in mongo::dropCollection(mongo::OperationContext*, mongo::NamespaceString const&, mongo::BSONObjBuilder&, mongo::repl::OpTime const&, mongo::DropCollectionSystemCollectionMode)::$_0::operator()() const (this=0x7f0fe7292a30) at src/mongo/db/catalog/drop_collection.cpp:62
#16 0x00007f101c5c6961 in mongo::writeConflictRetry<mongo::dropCollection(mongo::OperationContext*, mongo::NamespaceString const&, mongo::BSONObjBuilder&, mongo::repl::OpTime const&, mongo::DropCollectionSystemCollectionMode)::$_0> (opCtx=0x7f0fecffada0, opStr="drop", ns="test.drop", f=<unknown type in build/ninja/mongo/db/catalog/libcatalog_helpers.so, CU 0xeb93b, DIE 0x117fed>) at src/mongo/db/concurrency/write_conflict_exception.h:91
#17 0x00007f101c5c6739 in mongo::dropCollection (opCtx=0x7f0fecffada0, collectionName=..., result=..., dropOpTime=..., systemCollectionMode=mongo::DropCollectionSystemCollectionMode::kDisallowSystemCollectionDrops) at src/mongo/db/catalog/drop_collection.cpp:61
#18 0x00007f10123102f0 in mongo::(anonymous namespace)::CmdDrop::errmsgRun (this=0x7f10123e9680 <mongo::(anonymous namespace)::cmdDrop>, opCtx=0x7f0fecffada0, dbname=..., cmdObj=unowned BSONObj 172 bytes @ 0x7f0fecf1c43d = {...}, errmsg=..., result=...) at src/mongo/db/commands/dbcommands.cpp:310
#19 0x00007f1014f718d5 in mongo::ErrmsgCommandDeprecated::run (this=0x7f10123e9680 <mongo::(anonymous namespace)::cmdDrop>, opCtx=0x7f0fecffada0, db=..., cmdObj=unowned BSONObj 172 bytes @ 0x7f0fecf1c43d = {...}, result=...) at src/mongo/db/commands.cpp:529
#20 0x00007f1014f7e609 in mongo::BasicCommand::Invocation::run (this=0x7f0fed002c20, opCtx=0x7f0fecffada0, result=0x7f0fe72931f0) at src/mongo/db/commands.cpp:376
#21 0x00007f10144d26ef in mongo::(anonymous namespace)::invokeInTransaction (opCtx=0x7f0fecffada0, invocation=0x7f0fed002c20, replyBuilder=0x7f0fe72931f0) at src/mongo/db/service_entry_point_common.cpp:442
#22 0x00007f10144d14ce in mongo::(anonymous namespace)::runCommandImpl (opCtx=0x7f0fecffada0, invocation=0x7f0fed002c20, request=..., replyBuilder=0x7f0fecd588e0, startOperationTime=..., behaviors=...) at src/mongo/db/service_entry_point_common.cpp:504
#23 0x00007f10144cedea in mongo::(anonymous namespace)::execCommandDatabase (opCtx=0x7f0fecffada0, command=0x7f10123e9680 <mongo::(anonymous namespace)::cmdDrop>, request=..., replyBuilder=0x7f0fecd588e0, behaviors=...) at src/mongo/db/service_entry_point_common.cpp:814
#24 0x00007f10144cb1d0 in mongo::(anonymous namespace)::runCommands(mongo::OperationContext*, mongo::Message const&, mongo::ServiceEntryPointCommon::Hooks const&)::$_5::operator()() const (this=0x7f0fe7295038) at src/mongo/db/service_entry_point_common.cpp:954
#25 0x00007f10144c812d in mongo::(anonymous namespace)::runCommands (opCtx=0x7f0fecffada0, message=..., behaviors=...) at src/mongo/db/service_entry_point_common.cpp:901
#26 0x00007f10144c6de0 in mongo::ServiceEntryPointCommon::handleRequest (opCtx=0x7f0fecffada0, m=..., behaviors=...) at src/mongo/db/service_entry_point_common.cpp:1246
#27 0x00007f101d76d1f5 in mongo::ServiceEntryPointMongod::handleRequest (this=0x7f1010ffa8e0, opCtx=0x7f0fecffada0, m=...) at src/mongo/db/service_entry_point_mongod.cpp:123
#28 0x00007f101d71c87c in mongo::ServiceStateMachine::_processMessage (this=0x7f0fecf237f0, guard=...) at src/mongo/transport/service_state_machine.cpp:378
#29 0x00007f101d718360 in mongo::ServiceStateMachine::_runNextInGuard (this=0x7f0fecf237f0, guard=...) at src/mongo/transport/service_state_machine.cpp:439
#30 0x00007f101d71b738 in mongo::ServiceStateMachine::_scheduleNextWithGuard(mongo::ServiceStateMachine::ThreadGuard, mongo::transport::ServiceExecutor::ScheduleFlags, mongo::transport::ServiceExecutorTaskName, mongo::ServiceStateMachine::Ownership)::$_0::operator()() const (this=0x7f0fecede510) at src/mongo/transport/service_state_machine.cpp:479
#31 0x00007f101d71b53d in std::_Function_handler<void (), mongo::ServiceStateMachine::_scheduleNextWithGuard(mongo::ServiceStateMachine::ThreadGuard, mongo::transport::ServiceExecutor::ScheduleFlags, mongo::transport::ServiceExecutorTaskName, mongo::ServiceStateMachine::Ownership)::$_0>::_M_invoke(std::_Any_data const&) (__functor=...) at /opt/mongodbtoolchain/v2/bin/../lib/gcc/x86_64-mongodb-linux/5.4.0/../../../../include/c++/5.4.0/functional:1871
#32 0x00007f101fbcceee in std::function<void ()>::operator()() const (this=0x7f0fe7296020) at /opt/mongodbtoolchain/v2/bin/../lib/gcc/x86_64-mongodb-linux/5.4.0/../../../../include/c++/5.4.0/functional:2267
#33 0x00007f101fd3c412 in mongo::transport::ServiceExecutorSynchronous::schedule(std::function<void ()>, mongo::transport::ServiceExecutor::ScheduleFlags, mongo::transport::ServiceExecutorTaskName) (this=0x7f1010fc88e0, task=..., flags=mongo::transport::ServiceExecutor::kMayRecurse, taskName=mongo::transport::ServiceExecutorTaskName::kSSMProcessMessage) at src/mongo/transport/service_executor_synchronous.cpp:115
#34 0x00007f101d717eb7 in mongo::ServiceStateMachine::_scheduleNextWithGuard (this=0x7f0fecf237f0, guard=..., flags=mongo::transport::ServiceExecutor::kMayRecurse, taskName=mongo::transport::ServiceExecutorTaskName::kSSMProcessMessage, ownershipModel=mongo::ServiceStateMachine::Ownership::kOwned) at src/mongo/transport/service_state_machine.cpp:483
#35 0x00007f101d71765a in mongo::ServiceStateMachine::_sourceCallback (this=0x7f0fecf237f0, status=Status::OK()) at src/mongo/transport/service_state_machine.cpp:303
#36 0x00007f101d719e68 in mongo::ServiceStateMachine::_sourceMessage(mongo::ServiceStateMachine::ThreadGuard)::$_2::operator()(mongo::StatusWith<mongo::Message>) const (this=0x7f0fe7296758, msg=StatusWith(OK, {<boost::optional_detail::optional_base<mongo::Message>> = {<boost::optional_detail::optional_tag> = {<No data fields>}, m_initialized = true, m_storage = {dummy_ = {data = "\000\000\000\000\000\000\000", aligner_ = {<No data fields>}}}}, <No data fields>})) at src/mongo/transport/service_state_machine.cpp:259
#37 0x00007f101d719d2f in mongo::future_details::call<mongo::ServiceStateMachine::_sourceMessage(mongo::ServiceStateMachine::ThreadGuard)::$_2&, mongo::Message>(mongo::ServiceStateMachine::_sourceMessage(mongo::ServiceStateMachine::ThreadGuard)::$_2&, mongo::Message&&) (func=..., arg=<unknown type in build/ninja/mongo/transport/libservice_entry_point.so, CU 0x1de2b, DIE 0x3a437>) at src/mongo/util/future.h:111
#38 0x00007f101d719b83 in void mongo::future_details::Future<mongo::Message>::getAsync<mongo::ServiceStateMachine::_sourceMessage(mongo::ServiceStateMachine::ThreadGuard)::$_2>(mongo::ServiceStateMachine::_sourceMessage(mongo::ServiceStateMachine::ThreadGuard)::$_2&&) &&::{lambda(mongo::Message&&)#1}::operator()(mongo::Message&&) const (this=0x7f0fe72966f0, val=<unknown type in build/ninja/mongo/transport/libservice_entry_point.so, CU 0x1de2b, DIE 0x49c1d>) at src/mongo/util/future.h:759
#39 0x00007f101d71996e in mongo::future_details::Future<mongo::Message>::generalImpl<void mongo::future_details::Future<mongo::Message>::getAsync<mongo::ServiceStateMachine::_sourceMessage(mongo::ServiceStateMachine::ThreadGuard)::$_2>(mongo::ServiceStateMachine::_sourceMessage(mongo::ServiceStateMachine::ThreadGuard)::$_2&&) &&::{lambda(mongo::Message&&)#1}, void mongo::future_details::Future<mongo::Message>::getAsync<mongo::ServiceStateMachine::_sourceMessage(mongo::ServiceStateMachine::ThreadGuard)::$_2>(mongo::ServiceStateMachine::_sourceMessage(mongo::ServiceStateMachine::ThreadGuard)::$_2&&) &&::{lambda(mongo::Status&&)#1}, void mongo::future_details::Future<mongo::Message>::getAsync<mongo::ServiceStateMachine::_sourceMessage(mongo::ServiceStateMachine::ThreadGuard)::$_2>(mongo::ServiceStateMachine::_sourceMessage(mongo::ServiceStateMachine::ThreadGuard)::$_2&&) &&::{lambda()#1}>(mongo::ServiceStateMachine::_sourceMessage(mongo::ServiceStateMachine::ThreadGuard)::$_2&&, void mongo::future_details::Future<mongo::Message>::getAsync<mongo::ServiceStateMachine::_sourceMessage(mongo::ServiceStateMachine::ThreadGuard)::$_2>(mongo::ServiceStateMachine::_sourceMessage(mongo::ServiceStateMachine::ThreadGuard)::$_2&&) &&::{lambda(mongo::Status&&)#1}&&, void mongo::future_details::Future<mongo::Message>::getAsync<mongo::ServiceStateMachine::_sourceMessage(mongo::ServiceStateMachine::ThreadGuard)::$_2>(mongo::ServiceStateMachine::_sourceMessage(mongo::ServiceStateMachine::ThreadGuard)::$_2&&) &&::{lambda()#1}&&) (this=0x7f0fe7296770, success=<unknown type in build/ninja/mongo/transport/libservice_entry_point.so, CU 0x1de2b, DIE 0x3f22d>, fail=<unknown type in build/ninja/mongo/transport/libservice_entry_point.so, CU 0x1de2b, DIE 0x3f23c>, notReady=<unknown type in build/ninja/mongo/transport/libservice_entry_point.so, CU 0x1de2b, DIE 0x3f24b>) at src/mongo/util/future.h:1031
#40 0x00007f101d7171ad in mongo::future_details::Future<mongo::Message>::getAsync<mongo::ServiceStateMachine::_sourceMessage(mongo::ServiceStateMachine::ThreadGuard)::$_2>(mongo::ServiceStateMachine::_sourceMessage(mongo::ServiceStateMachine::ThreadGuard)::$_2&&) && (this=0x7f0fe7296770, func=<unknown type in build/ninja/mongo/transport/libservice_entry_point.so, CU 0x1de2b, DIE 0x3f1af>) at src/mongo/util/future.h:757
#41 0x00007f101d716fc4 in mongo::ServiceStateMachine::_sourceMessage (this=0x7f0fecf237f0, guard=...) at src/mongo/transport/service_state_machine.cpp:254
#42 0x00007f101d7182e3 in mongo::ServiceStateMachine::_runNextInGuard (this=0x7f0fecf237f0, guard=...) at src/mongo/transport/service_state_machine.cpp:436
#43 0x00007f101d71b738 in mongo::ServiceStateMachine::_scheduleNextWithGuard(mongo::ServiceStateMachine::ThreadGuard, mongo::transport::ServiceExecutor::ScheduleFlags, mongo::transport::ServiceExecutorTaskName, mongo::ServiceStateMachine::Ownership)::$_0::operator()() const (this=0x7f0fecedda20) at src/mongo/transport/service_state_machine.cpp:479
#44 0x00007f101d71b53d in std::_Function_handler<void (), mongo::ServiceStateMachine::_scheduleNextWithGuard(mongo::ServiceStateMachine::ThreadGuard, mongo::transport::ServiceExecutor::ScheduleFlags, mongo::transport::ServiceExecutorTaskName, mongo::ServiceStateMachine::Ownership)::$_0>::_M_invoke(std::_Any_data const&) (__functor=...) at /opt/mongodbtoolchain/v2/bin/../lib/gcc/x86_64-mongodb-linux/5.4.0/../../../../include/c++/5.4.0/functional:1871
#45 0x00007f101fbcceee in std::function<void ()>::operator()() const (this=0x7f100bacf520) at /opt/mongodbtoolchain/v2/bin/../lib/gcc/x86_64-mongodb-linux/5.4.0/../../../../include/c++/5.4.0/functional:2267
#46 0x00007f101fd3cf49 in mongo::transport::ServiceExecutorSynchronous::schedule(std::function<void ()>, mongo::transport::ServiceExecutor::ScheduleFlags, mongo::transport::ServiceExecutorTaskName)::$_1::operator()() const (this=0x7f100c509e00) at src/mongo/transport/service_executor_synchronous.cpp:132
#47 0x00007f101fd3ccbd in std::_Function_handler<void (), mongo::transport::ServiceExecutorSynchronous::schedule(std::function<void ()>, mongo::transport::ServiceExecutor::ScheduleFlags, mongo::transport::ServiceExecutorTaskName)::$_1>::_M_invoke(std::_Any_data const&) (__functor=...) at /opt/mongodbtoolchain/v2/bin/../lib/gcc/x86_64-mongodb-linux/5.4.0/../../../../include/c++/5.4.0/functional:1871
#48 0x00007f101fbcceee in std::function<void ()>::operator()() const (this=0x7f0fecd9e7a0) at /opt/mongodbtoolchain/v2/bin/../lib/gcc/x86_64-mongodb-linux/5.4.0/../../../../include/c++/5.4.0/functional:2267
#49 0x00007f1014763cbc in mongo::(anonymous namespace)::runFunc (ctx=0x7f0fecd9e7a0) at src/mongo/transport/service_entry_point_utils.cpp:55
#50 0x00007f101dcf26aa in start_thread (arg=0x7f0fe7297700) at pthread_create.c:333
#51 0x00007f101da27eed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Comment by Dianna Hohensee (Inactive) [ 09/May/18 ]

This is the PeriodicRunnerASIO backtrace for the LogicalSessionCacheImpl::_periodicRefresh job, waiting on an IS lock

Thread 41 (Thread 0x7f10009fa700 (LWP 29281)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x00007f101fd2e615 in __gthread_cond_timedwait (__cond=0x7f0fecfa8450, __mutex=0x7f0fecfa8428, __abs_timeout=0x7f10009f6f40) at /opt/mongodbtoolchain/v2/bin/../lib/gcc/x86_64-mongodb-linux/5.4.0/../../../../include/c++/5.4.0/x86_64-mongodb-linux/bits/gthr-default.h:871
#2  0x00007f101fd39a12 in std::condition_variable::__wait_until_impl<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (this=0x7f0fecfa8450, __lock=..., __atime=...) at /opt/mongodbtoolchain/v2/bin/../lib/gcc/x86_64-mongodb-linux/5.4.0/../../../../include/c++/5.4.0/condition_variable:165
#3  0x00007f101fd39945 in std::condition_variable::wait_until<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (this=0x7f0fecfa8450, __lock=..., __atime=...) at /opt/mongodbtoolchain/v2/bin/../lib/gcc/x86_64-mongodb-linux/5.4.0/../../../../include/c++/5.4.0/condition_variable:105
#4  0x00007f10147825a9 in mongo::ClockSource::waitForConditionUntil (this=0x7f10115afb20, cv=..., m=..., deadline=...) at src/mongo/util/clock_source.cpp:39
#5  0x00007f10149d0dba in mongo::OperationContext::waitForConditionOrInterruptNoAssertUntil(std::condition_variable&, std::unique_lock<std::mutex>&, mongo::Date_t)::$_3::operator()() const (this=0x7f10009f71f0) at src/mongo/db/operation_context.cpp:313
#6  0x00007f10149d09cb in mongo::OperationContext::waitForConditionOrInterruptNoAssertUntil (this=0x7f0fecf78c20, cv=..., m=..., deadline=...) at src/mongo/db/operation_context.cpp:308
#7  0x00007f10149d0ca7 in mongo::OperationContext::waitForConditionOrInterruptUntil (this=0x7f0fecf78c20, cv=..., m=..., deadline=...) at src/mongo/db/operation_context.cpp:248
#8  0x00007f1017a82356 in mongo::OperationContext::waitForConditionOrInterruptUntil<mongo::CondVarLockGrantNotification::wait(mongo::OperationContext*, mongo::Duration<std::ratio<1l, 1000l> >)::$_1>(std::condition_variable&, std::unique_lock<std::mutex>&, mongo::Date_t, mongo::CondVarLockGrantNotification::wait(mongo::OperationContext*, mongo::Duration<std::ratio<1l, 1000l> >)::$_1) (this=0x7f0fecf78c20, cv=..., m=..., deadline=..., pred=...) at src/mongo/db/operation_context.h:205
#9  0x00007f1017a813a9 in mongo::OperationContext::waitForConditionOrInterruptFor<mongo::CondVarLockGrantNotification::wait(mongo::OperationContext*, mongo::Duration<std::ratio<1l, 1000l> >)::$_1>(std::condition_variable&, std::unique_lock<std::mutex>&, mongo::Duration<std::ratio<1l, 1000l> >, mongo::CondVarLockGrantNotification::wait(mongo::OperationContext*, mongo::Duration<std::ratio<1l, 1000l> >)::$_1) (this=0x7f0fecf78c20, cv=..., m=..., duration=..., pred=...) at src/mongo/db/operation_context.h:221
#10 0x00007f1017a80d40 in mongo::CondVarLockGrantNotification::wait (this=0x7f0fecfa8420, opCtx=0x7f0fecf78c20, timeout=...) at src/mongo/db/concurrency/lock_state.cpp:230
#11 0x00007f1017a8ac79 in mongo::LockerImpl<false>::lockComplete (this=0x7f0fecfa83a0, opCtx=0x7f0fecf78c20, resId=..., mode=mongo::MODE_IS, deadline=..., checkDeadlock=false) at src/mongo/db/concurrency/lock_state.cpp:857
#12 0x00007f1017a8bdb7 in mongo::LockerImpl<false>::lock (this=0x7f0fecfa83a0, opCtx=0x7f0fecf78c20, resId=..., mode=mongo::MODE_IS, deadline=..., checkDeadlock=false) at src/mongo/db/concurrency/lock_state.cpp:492
#13 0x00007f1017a66b29 in mongo::Lock::DBLock::DBLock (this=0x7f10009f8170, opCtx=0x7f0fecf78c20, db="test", mode=mongo::MODE_IS, deadline=...) at src/mongo/db/concurrency/d_concurrency.cpp:211
#14 0x00007f10197f59b2 in mongo::AutoGetDb::AutoGetDb (this=0x7f10009f8170, opCtx=0x7f0fecf78c20, dbName="test", mode=mongo::MODE_IS, deadline=...) at src/mongo/db/catalog_raii.cpp:59
#15 0x00007f10197f5dc0 in mongo::AutoGetCollection::AutoGetCollection (this=0x7f10009f8148, opCtx=0x7f0fecf78c20, nsOrUUID=..., modeDB=mongo::MODE_IS, modeColl=mongo::MODE_IS, viewMode=mongo::AutoGetCollection::kViewsForbidden, deadline=...) at src/mongo/db/catalog_raii.cpp:78
#16 0x00007f101d85f3c4 in mongo::AutoGetCollection::AutoGetCollection (this=0x7f10009f8148, opCtx=0x7f0fecf78c20, nsOrUUID=..., modeAll=mongo::MODE_IS, viewMode=mongo::AutoGetCollection::kViewsForbidden, deadline=...) at src/mongo/db/catalog_raii.h:92
#17 0x00007f101abfae56 in mongo::(anonymous namespace)::GlobalCursorIdCache::visitAllCursorManagers<mongo::CursorManager::appendAllActiveSessions(mongo::OperationContext*, std::unordered_set<mongo::LogicalSessionId, mongo::LogicalSessionIdHash, std::equal_to<mongo::LogicalSessionId>, std::allocator<mongo::LogicalSessionId> >*)::$_3>(mongo::OperationContext*, mongo::CursorManager::appendAllActiveSessions(mongo::OperationContext*, std::unordered_set<mongo::LogicalSessionId, mongo::LogicalSessionIdHash, std::equal_to<mongo::LogicalSessionId>, std::allocator<mongo::LogicalSessionId> >*)::$_3*) (this=0x7f10112d7b40, opCtx=0x7f0fecf78c20, visitor=0x7f10009f8278) at src/mongo/db/cursor_manager.cpp:416
#18 0x00007f101abfabb1 in mongo::CursorManager::appendAllActiveSessions (opCtx=0x7f0fecf78c20, lsids=0x7f10009f8670) at src/mongo/db/cursor_manager.cpp:438
#19 0x00007f101af5c7e5 in mongo::ServiceLiasonMongod::getOpenCursorSessions (this=0x7f1010fb75a0) at src/mongo/db/service_liason_mongod.cpp:83
#20 0x00007f1019ac3705 in mongo::LogicalSessionCacheImpl::_refresh (this=0x7f100d50ec20, client=0x7f100bc052a0) at src/mongo/db/logical_session_cache_impl.cpp:343
#21 0x00007f1019ac43ac in mongo::LogicalSessionCacheImpl::_periodicRefresh (this=0x7f100d50ec20, client=0x7f100bc052a0) at src/mongo/db/logical_session_cache_impl.cpp:162
#22 0x00007f1019ac5710 in mongo::LogicalSessionCacheImpl::LogicalSessionCacheImpl(std::unique_ptr<mongo::ServiceLiason, std::default_delete<mongo::ServiceLiason> >, std::shared_ptr<mongo::SessionsCollection>, std::shared_ptr<mongo::TransactionReaper>, mongo::LogicalSessionCacheImpl::Options)::$_1::operator()(mongo::Client*) const (this=0x7f101105e4b0, client=0x7f100bc052a0) at src/mongo/db/logical_session_cache_impl.cpp:69
#23 0x00007f1019ac55c2 in std::_Function_handler<void (mongo::Client*), mongo::LogicalSessionCacheImpl::LogicalSessionCacheImpl(std::unique_ptr<mongo::ServiceLiason, std::default_delete<mongo::ServiceLiason> >, std::shared_ptr<mongo::SessionsCollection>, std::shared_ptr<mongo::TransactionReaper>, mongo::LogicalSessionCacheImpl::Options)::$_1>::_M_invoke(std::_Any_data const&, mongo::Client*&&) (__functor=..., __args=<unknown type in build/ninja/mongo/db/liblogical_session_cache_impl.so, CU 0x21004, DIE 0x6745d>) at /opt/mongodbtoolchain/v2/bin/../lib/gcc/x86_64-mongodb-linux/5.4.0/../../../../include/c++/5.4.0/functional:1871
#24 0x00007f1019d8d414 in std::function<void (mongo::Client*)>::operator()(mongo::Client*) const (this=0x7f101105e4b0, __args=0x7f100bc052a0) at /opt/mongodbtoolchain/v2/bin/../lib/gcc/x86_64-mongodb-linux/5.4.0/../../../../include/c++/5.4.0/functional:2267
#25 0x00007f1019d84db0 in mongo::PeriodicRunnerASIO::_scheduleJob(std::weak_ptr<mongo::PeriodicRunnerASIO::PeriodicJobASIO>, bool)::$_0::operator()(std::error_code) (this=0x7f0fecd9f060, ec=...) at src/mongo/util/periodic_runner_asio.cpp:96
#26 0x00007f1019d849aa in std::_Function_handler<void (std::error_code), mongo::PeriodicRunnerASIO::_scheduleJob(std::weak_ptr<mongo::PeriodicRunnerASIO::PeriodicJobASIO>, bool)::$_0>::_M_invoke(std::_Any_data const&, std::error_code&&) (__functor=..., __args=<unknown type in build/ninja/mongo/util/libperiodic_runner_asio.so, CU 0x0, DIE 0x26ffa>) at /opt/mongodbtoolchain/v2/bin/../lib/gcc/x86_64-mongodb-linux/5.4.0/../../../../include/c++/5.4.0/functional:1871
#27 0x00007f1017228ba7 in std::function<void (std::error_code)>::operator()(std::error_code) const (this=0x7f10009f9188, __args=...) at /opt/mongodbtoolchain/v2/bin/../lib/gcc/x86_64-mongodb-linux/5.4.0/../../../../include/c++/5.4.0/functional:2267
#28 0x00007f1017228b1d in asio::detail::binder1<std::function<void (std::error_code)>, std::error_code>::operator()() (this=0x7f10009f9188) at src/third_party/asio-master/asio/include/asio/detail/bind_handler.hpp:64
#29 0x00007f1017228ae1 in asio::asio_handler_invoke<asio::detail::binder1<std::function<void (std::error_code)>, std::error_code> >(asio::detail::binder1<std::function<void (std::error_code)>, std::error_code>&, ...) (function=...) at src/third_party/asio-master/asio/include/asio/handler_invoke_hook.hpp:68
#30 0x00007f1017228ac2 in asio_handler_invoke_helpers::invoke<asio::detail::binder1<std::function<void (std::error_code)>, std::error_code>, std::function<void (std::error_code)> >(asio::detail::binder1<std::function<void (std::error_code)>, std::error_code>&, std::function<void (std::error_code)>&) (function=..., context=...) at src/third_party/asio-master/asio/include/asio/detail/handler_invoke_helpers.hpp:37
#31 0x00007f1017228a7d in asio::detail::asio_handler_invoke<asio::detail::binder1<std::function<void (std::error_code)>, std::error_code>, std::function<void (std::error_code)>, std::error_code>(asio::detail::binder1<std::function<void (std::error_code)>, std::error_code>&, asio::detail::binder1<std::function<void (std::error_code)>, std::error_code>*) (function=..., this_handler=0x7f10009f9188) at src/third_party/asio-master/asio/include/asio/detail/bind_handler.hpp:105
#32 0x00007f1017228760 in asio_handler_invoke_helpers::invoke<asio::detail::binder1<std::function<void (std::error_code)>, std::error_code>, asio::detail::binder1<std::function<void (std::error_code)>, std::error_code> >(asio::detail::binder1<std::function<void (std::error_code)>, std::error_code>&, asio::detail::binder1<std::function<void (std::error_code)>, std::error_code>&) (function=..., context=...) at src/third_party/asio-master/asio/include/asio/detail/handler_invoke_helpers.hpp:37
#33 0x00007f10172284f7 in asio::detail::strand_service::dispatch<asio::detail::binder1<std::function<void (std::error_code)>, std::error_code> >(asio::detail::strand_service::strand_impl*&, asio::detail::binder1<std::function<void (std::error_code)>, std::error_code>&) (this=0x7f100da0d820, impl=@0x7f10009f9350: 0x7f100e20f980, handler=...) at src/third_party/asio-master/asio/include/asio/detail/impl/strand_service.hpp:61
#34 0x00007f10172283d5 in asio::io_context::strand::dispatch<asio::detail::binder1<std::function<void (std::error_code)>, std::error_code> >(asio::detail::binder1<std::function<void (std::error_code)>, std::error_code>&&) (this=0x7f10009f9348, handler=<unknown type in build/ninja/mongo/executor/libasync_timer_asio.so, CU 0x0, DIE 0x1bdfa>) at src/third_party/asio-master/asio/include/asio/io_context_strand.hpp:224
#35 0x00007f101722833f in asio::detail::wrapped_handler<asio::io_context::strand, std::function<void (std::error_code)>, asio::detail::is_continuation_if_running>::operator()<std::error_code>(std::error_code const&) (this=0x7f10009f9348, arg1=...) at src/third_party/asio-master/asio/include/asio/detail/wrapped_handler.hpp:86
#36 0x00007f10172282cf in asio::detail::binder1<asio::detail::wrapped_handler<asio::io_context::strand, std::function<void (std::error_code)>, asio::detail::is_continuation_if_running>, std::error_code>::operator()() (this=0x7f10009f9348) at src/third_party/asio-master/asio/include/asio/detail/bind_handler.hpp:64
#37 0x00007f10172282a9 in asio::detail::rewrapped_handler<asio::detail::binder1<asio::detail::wrapped_handler<asio::io_context::strand, std::function<void (std::error_code)>, asio::detail::is_continuation_if_running>, std::error_code>, std::function<void (std::error_code)> >::operator()() (this=0x7f10009f9328) at src/third_party/asio-master/asio/include/asio/detail/wrapped_handler.hpp:190
#38 0x00007f1017228281 in asio::asio_handler_invoke<asio::detail::rewrapped_handler<asio::detail::binder1<asio::detail::wrapped_handler<asio::io_context::strand, std::function<void (std::error_code)>, asio::detail::is_continuation_if_running>, std::error_code>, std::function<void (std::error_code)> > >(asio::detail::rewrapped_handler<asio::detail::binder1<asio::detail::wrapped_handler<asio::io_context::strand, std::function<void (std::error_code)>, asio::detail::is_continuation_if_running>, std::error_code>, std::function<void (std::error_code)> >&, ...) (function=...) at src/third_party/asio-master/asio/include/asio/handler_invoke_hook.hpp:68
#39 0x00007f1017228262 in asio_handler_invoke_helpers::invoke<asio::detail::rewrapped_handler<asio::detail::binder1<asio::detail::wrapped_handler<asio::io_context::strand, std::function<void (std::error_code)>, asio::detail::is_continuation_if_running>, std::error_code>, std::function<void (std::error_code)> >, std::function<void (std::error_code)> >(asio::detail::rewrapped_handler<asio::detail::binder1<asio::detail::wrapped_handler<asio::io_context::strand, std::function<void (std::error_code)>, asio::detail::is_continuation_if_running>, std::error_code>, std::function<void (std::error_code)> >&, std::function<void (std::error_code)>&) (function=..., context=...) at src/third_party/asio-master/asio/include/asio/detail/handler_invoke_helpers.hpp:37
#40 0x00007f101722821d in asio::detail::asio_handler_invoke<asio::detail::rewrapped_handler<asio::detail::binder1<asio::detail::wrapped_handler<asio::io_context::strand, std::function<void (std::error_code)>, asio::detail::is_continuation_if_running>, std::error_code>, std::function<void (std::error_code)> >, asio::detail::binder1<asio::detail::wrapped_handler<asio::io_context::strand, std::function<void (std::error_code)>, asio::detail::is_continuation_if_running>, std::error_code>, std::function<void (std::error_code)> >(asio::detail::rewrapped_handler<asio::detail::binder1<asio::detail::wrapped_handler<asio::io_context::strand, std::function<void (std::error_code)>, asio::detail::is_continuation_if_running>, std::error_code>, std::function<void (std::error_code)> >&, asio::detail::rewrapped_handler<asio::detail::binder1<asio::detail::wrapped_handler<asio::io_context::strand, std::function<void (std::error_code)>, asio::detail::is_continuation_if_running>, std::error_code>, std::function<void (std::error_code)> >*) (function=..., this_handler=0x7f10009f9328) at src/third_party/asio-master/asio/include/asio/detail/wrapped_handler.hpp:274
#41 0x00007f1017227ce0 in asio_handler_invoke_helpers::invoke<asio::detail::rewrapped_handler<asio::detail::binder1<asio::detail::wrapped_handler<asio::io_context::strand, std::function<void (std::error_code)>, asio::detail::is_continuation_if_running>, std::error_code>, std::function<void (std::error_code)> >, asio::detail::rewrapped_handler<asio::detail::binder1<asio::detail::wrapped_handler<asio::io_context::strand, std::function<void (std::error_code)>, asio::detail::is_continuation_if_running>, std::error_code>, std::function<void (std::error_code)> > >(asio::detail::rewrapped_handler<asio::detail::binder1<asio::detail::wrapped_handler<asio::io_context::strand, std::function<void (std::error_code)>, asio::detail::is_continuation_if_running>, std::error_code>, std::function<void (std::error_code)> >&, asio::detail::rewrapped_handler<asio::detail::binder1<asio::detail::wrapped_handler<asio::io_context::strand, std::function<void (std::error_code)>, asio::detail::is_continuation_if_running>, std::error_code>, std::function<void (std::error_code)> >&) (function=..., context=...) at src/third_party/asio-master/asio/include/asio/detail/handler_invoke_helpers.hpp:37
#42 0x00007f1017229651 in asio::detail::handler_work<asio::detail::rewrapped_handler<asio::detail::binder1<asio::detail::wrapped_handler<asio::io_context::strand, std::function<void (std::error_code)>, asio::detail::is_continuation_if_running>, std::error_code>, std::function<void (std::error_code)> >, asio::system_executor>::complete<asio::detail::rewrapped_handler<asio::detail::binder1<asio::detail::wrapped_handler<asio::io_context::strand, std::function<void (std::error_code)>, asio::detail::is_continuation_if_running>, std::error_code>, std::function<void (std::error_code)> > >(asio::detail::rewrapped_handler<asio::detail::binder1<asio::detail::wrapped_handler<asio::io_context::strand, std::function<void (std::error_code)>, asio::detail::is_continuation_if_running>, std::error_code>, std::function<void (std::error_code)> >&, asio::detail::rewrapped_handler<asio::detail::binder1<asio::detail::wrapped_handler<asio::io_context::strand, std::function<void (std::error_code)>, asio::detail::is_continuation_if_running>, std::error_code>, std::function<void (std::error_code)> >&) (this=0x7f10009f9388, function=..., handler=...) at src/third_party/asio-master/asio/include/asio/detail/handler_work.hpp:81
#43 0x00007f1017227fa3 in asio::detail::completion_handler<asio::detail::rewrapped_handler<asio::detail::binder1<asio::detail::wrapped_handler<asio::io_context::strand, std::function<void (std::error_code)>, asio::detail::is_continuation_if_running>, std::error_code>, std::function<void (std::error_code)> > >::do_complete(void*, asio::detail::scheduler_operation*, std::error_code const&, unsigned long) (owner=0x7f100db02a20, base=0x7f0fecebdfe0) at src/third_party/asio-master/asio/include/asio/detail/completion_handler.hpp:69
#44 0x00007f1017227ada in asio::detail::strand_service::dispatch<asio::detail::rewrapped_handler<asio::detail::binder1<asio::detail::wrapped_handler<asio::io_context::strand, std::function<void (std::error_code)>, asio::detail::is_continuation_if_running>, std::error_code>, std::function<void (std::error_code)> > >(asio::detail::strand_service::strand_impl*&, asio::detail::rewrapped_handler<asio::detail::binder1<asio::detail::wrapped_handler<asio::io_context::strand, std::function<void (std::error_code)>, asio::detail::is_continuation_if_running>, std::error_code>, std::function<void (std::error_code)> >&) (this=0x7f100da0d820, impl=@0x7f10009f9640: 0x7f100e20f980, handler=...) at src/third_party/asio-master/asio/include/asio/detail/impl/strand_service.hpp:87
#45 0x00007f1017227825 in asio::io_context::strand::dispatch<asio::detail::rewrapped_handler<asio::detail::binder1<asio::detail::wrapped_handler<asio::io_context::strand, std::function<void (std::error_code)>, asio::detail::is_continuation_if_running>, std::error_code>, std::function<void (std::error_code)> > >(asio::detail::rewrapped_handler<asio::detail::binder1<asio::detail::wrapped_handler<asio::io_context::strand, std::function<void (std::error_code)>, asio::detail::is_continuation_if_running>, std::error_code>, std::function<void (std::error_code)> >&&) (this=0x7f10009f9638, handler=<unknown type in build/ninja/mongo/executor/libasync_timer_asio.so, CU 0x0, DIE 0x1b884>) at src/third_party/asio-master/asio/include/asio/io_context_strand.hpp:224
#46 0x00007f1017227789 in asio::detail::asio_handler_invoke<asio::detail::binder1<asio::detail::wrapped_handler<asio::io_context::strand, std::function<void (std::error_code)>, asio::detail::is_continuation_if_running>, std::error_code>, asio::io_context::strand, std::function<void (std::error_code)>, asio::detail::is_continuation_if_running>(asio::detail::binder1<asio::detail::wrapped_handler<asio::io_context::strand, std::function<void (std::error_code)>, asio::detail::is_continuation_if_running>, std::error_code>&, asio::detail::wrapped_handler<asio::io_context::strand, std::function<void (std::error_code)>, asio::detail::is_continuation_if_running>*) (function=..., this_handler=0x7f10009f9638) at src/third_party/asio-master/asio/include/asio/detail/wrapped_handler.hpp:231
#47 0x00007f1017227710 in asio_handler_invoke_helpers::invoke<asio::detail::binder1<asio::detail::wrapped_handler<asio::io_context::strand, std::function<void (std::error_code)>, asio::detail::is_continuation_if_running>, std::error_code>, asio::detail::wrapped_handler<asio::io_context::strand, std::function<void (std::error_code)>, asio::detail::is_continuation_if_running> >(asio::detail::binder1<asio::detail::wrapped_handler<asio::io_context::strand, std::function<void (std::error_code)>, asio::detail::is_continuation_if_running>, std::error_code>&, asio::detail::wrapped_handler<asio::io_context::strand, std::function<void (std::error_code)>, asio::detail::is_continuation_if_running>&) (function=..., context=...) at src/third_party/asio-master/asio/include/asio/detail/handler_invoke_helpers.hpp:37
#48 0x00007f10172274e1 in asio::detail::handler_work<asio::detail::wrapped_handler<asio::io_context::strand, std::function<void (std::error_code)>, asio::detail::is_continuation_if_running>, asio::system_executor>::complete<asio::detail::binder1<asio::detail::wrapped_handler<asio::io_context::strand, std::function<void (std::error_code)>, asio::detail::is_continuation_if_running>, std::error_code> >(asio::detail::binder1<asio::detail::wrapped_handler<asio::io_context::strand, std::function<void (std::error_code)>, asio::detail::is_continuation_if_running>, std::error_code>&, asio::detail::wrapped_handler<asio::io_context::strand, std::function<void (std::error_code)>, asio::detail::is_continuation_if_running>&) (this=0x7f10009f9678, function=..., handler=...) at src/third_party/asio-master/asio/include/asio/detail/handler_work.hpp:81
#49 0x00007f10172271d5 in asio::detail::wait_handler<asio::detail::wrapped_handler<asio::io_context::strand, std::function<void (std::error_code)>, asio::detail::is_continuation_if_running> >::do_complete(void*, asio::detail::scheduler_operation*, std::error_code const&, unsigned long) (owner=0x7f100db02a20, base=0x7f0fecebdfe0) at src/third_party/asio-master/asio/include/asio/detail/wait_handler.hpp:71
#50 0x00007f10170d9317 in asio::detail::scheduler_operation::complete (this=0x7f0fecebdfe0, owner=0x7f100db02a20, ec=..., bytes_transferred=0) at src/third_party/asio-master/asio/include/asio/detail/scheduler_operation.hpp:39
#51 0x00007f10170ca0d8 in asio::detail::scheduler::do_run_one (this=0x7f100db02a20, lock=..., this_thread=..., ec=...) at src/third_party/asio-master/asio/include/asio/detail/impl/scheduler.ipp:400
#52 0x00007f10170c01d6 in asio::detail::scheduler::run (this=0x7f100db02a20, ec=...) at src/third_party/asio-master/asio/include/asio/detail/impl/scheduler.ipp:153
#53 0x00007f10170c0331 in asio::io_context::run (this=0x7f100db0cbe8, ec=...) at src/third_party/asio-master/asio/include/asio/impl/io_context.ipp:69
#54 0x00007f1019d86d36 in mongo::PeriodicRunnerASIO::_spawnThreads(mongo::WithLock)::$_1::operator()() const (this=0x7f10112548e8) at src/mongo/util/periodic_runner_asio.cpp:112
#55 0x00007f1019d86ba5 in std::_Bind_simple<mongo::PeriodicRunnerASIO::_spawnThreads(mongo::WithLock)::$_1 ()>::_M_invoke<>(std::_Index_tuple<>) (this=0x7f10112548e8) at /opt/mongodbtoolchain/v2/bin/../lib/gcc/x86_64-mongodb-linux/5.4.0/../../../../include/c++/5.4.0/functional:1530
#56 0x00007f1019d86b75 in std::_Bind_simple<mongo::PeriodicRunnerASIO::_spawnThreads(mongo::WithLock)::$_1 ()>::operator()() (this=0x7f10112548e8) at /opt/mongodbtoolchain/v2/bin/../lib/gcc/x86_64-mongodb-linux/5.4.0/../../../../include/c++/5.4.0/functional:1520
#57 0x00007f1019d86a69 in std::thread::_Impl<std::_Bind_simple<mongo::PeriodicRunnerASIO::_spawnThreads(mongo::WithLock)::$_1 ()> >::_M_run() (this=0x7f10112548d0) at /opt/mongodbtoolchain/v2/bin/../lib/gcc/x86_64-mongodb-linux/5.4.0/../../../../include/c++/5.4.0/thread:115
#58 0x00007f101beac220 in std::execute_native_thread_routine (__p=<optimized out>) at ../../../../../gcc-5.4.0/libstdc++-v3/src/c++11/thread.cc:84
#59 0x00007f101dcf26aa in start_thread (arg=0x7f10009fa700) at pthread_create.c:333
#60 0x00007f101da27eed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Comment by Dianna Hohensee (Inactive) [ 09/May/18 ]

This is the clientcursormon backtrace, waiting on an IS lock

Thread 46 (Thread 0x7f10031ff700 (LWP 29276)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x00007f101fd2e615 in __gthread_cond_timedwait (__cond=0x7f0fecee3dd0, __mutex=0x7f0fecee3da8, __abs_timeout=0x7f10031fdc10) at /opt/mongodbtoolchain/v2/bin/../lib/gcc/x86_64-mongodb-linux/5.4.0/../../../../include/c++/5.4.0/x86_64-mongodb-linux/bits/gthr-default.h:871
#2  0x00007f101fd39a12 in std::condition_variable::__wait_until_impl<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (this=0x7f0fecee3dd0, __lock=..., __atime=...) at /opt/mongodbtoolchain/v2/bin/../lib/gcc/x86_64-mongodb-linux/5.4.0/../../../../include/c++/5.4.0/condition_variable:165
#3  0x00007f101fd39945 in std::condition_variable::wait_until<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (this=0x7f0fecee3dd0, __lock=..., __atime=...) at /opt/mongodbtoolchain/v2/bin/../lib/gcc/x86_64-mongodb-linux/5.4.0/../../../../include/c++/5.4.0/condition_variable:105
#4  0x00007f1017a82236 in std::condition_variable::wait_until<std::chrono::_V2::system_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> >, mongo::CondVarLockGrantNotification::wait(mongo::Duration<std::ratio<1l, 1000l> >)::$_0>(std::unique_lock<std::mutex>&, std::chrono::time_point<std::chrono::_V2::system_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > > const&, mongo::CondVarLockGrantNotification::wait(mongo::Duration<std::ratio<1l, 1000l> >)::$_0) (this=0x7f0fecee3dd0, __lock=..., __atime=..., __p=...) at /opt/mongodbtoolchain/v2/bin/../lib/gcc/x86_64-mongodb-linux/5.4.0/../../../../include/c++/5.4.0/condition_variable:128
#5  0x00007f1017a81304 in std::condition_variable::wait_for<long, std::ratio<1l, 1000000000l>, mongo::CondVarLockGrantNotification::wait(mongo::Duration<std::ratio<1l, 1000l> >)::$_0>(std::unique_lock<std::mutex>&, std::chrono::duration<long, std::ratio<1l, 1000000000l> > const&, mongo::CondVarLockGrantNotification::wait(mongo::Duration<std::ratio<1l, 1000l> >)::$_0) (this=0x7f0fecee3dd0, __lock=..., __rtime=..., __p=...) at /opt/mongodbtoolchain/v2/bin/../lib/gcc/x86_64-mongodb-linux/5.4.0/../../../../include/c++/5.4.0/condition_variable:144
#6  0x00007f1017a80e4d in mongo::CondVarLockGrantNotification::wait (this=0x7f0fecee3da0, timeout=...) at src/mongo/db/concurrency/lock_state.cpp:221
#7  0x00007f1017a8acd1 in mongo::LockerImpl<false>::lockComplete (this=0x7f0fecee3d20, opCtx=0x7f0fecf247a0, resId=..., mode=mongo::MODE_IS, deadline=..., checkDeadlock=false) at src/mongo/db/concurrency/lock_state.cpp:859
#8  0x00007f1017a8bdb7 in mongo::LockerImpl<false>::lock (this=0x7f0fecee3d20, opCtx=0x7f0fecf247a0, resId=..., mode=mongo::MODE_IS, deadline=..., checkDeadlock=false) at src/mongo/db/concurrency/lock_state.cpp:492
#9  0x00007f1017a66b29 in mongo::Lock::DBLock::DBLock (this=0x7f10031fe6c8, opCtx=0x7f0fecf247a0, db="test", mode=mongo::MODE_IS, deadline=...) at src/mongo/db/concurrency/d_concurrency.cpp:211
#10 0x00007f10197f59b2 in mongo::AutoGetDb::AutoGetDb (this=0x7f10031fe6c8, opCtx=0x7f0fecf247a0, dbName="test", mode=mongo::MODE_IS, deadline=...) at src/mongo/db/catalog_raii.cpp:59
#11 0x00007f101abfc6ca in mongo::(anonymous namespace)::GlobalCursorIdCache::timeoutCursors (this=0x7f10112d7b40, opCtx=0x7f0fecf247a0, now=...) at src/mongo/db/cursor_manager.cpp:380
#12 0x00007f101abfc3e1 in mongo::CursorManager::timeoutCursorsGlobal (opCtx=0x7f0fecf247a0, now=...) at src/mongo/db/cursor_manager.cpp:477
#13 0x00007f101abf6bfb in mongo::(anonymous namespace)::ClientCursorMonitor::run (this=0x7f101ae632f8 <mongo::(anonymous namespace)::clientCursorMonitor>) at src/mongo/db/clientcursor.cpp:269
#14 0x00007f101782e404 in mongo::BackgroundJob::jobBody (this=0x7f101ae632f8 <mongo::(anonymous namespace)::clientCursorMonitor>) at src/mongo/util/background.cpp:150
#15 0x00007f1017830fb8 in mongo::BackgroundJob::go()::$_1::operator()() const (this=0x7f1011255248) at src/mongo/util/background.cpp:180
#16 0x00007f1017830f75 in std::_Bind_simple<mongo::BackgroundJob::go()::$_1 ()>::_M_invoke<>(std::_Index_tuple<>) (this=0x7f1011255248) at /opt/mongodbtoolchain/v2/bin/../lib/gcc/x86_64-mongodb-linux/5.4.0/../../../../include/c++/5.4.0/functional:1530
#17 0x00007f1017830f45 in std::_Bind_simple<mongo::BackgroundJob::go()::$_1 ()>::operator()() (this=0x7f1011255248) at /opt/mongodbtoolchain/v2/bin/../lib/gcc/x86_64-mongodb-linux/5.4.0/../../../../include/c++/5.4.0/functional:1520
#18 0x00007f1017830e39 in std::thread::_Impl<std::_Bind_simple<mongo::BackgroundJob::go()::$_1 ()> >::_M_run() (this=0x7f1011255230) at /opt/mongodbtoolchain/v2/bin/../lib/gcc/x86_64-mongodb-linux/5.4.0/../../../../include/c++/5.4.0/thread:115
#19 0x00007f101beac220 in std::execute_native_thread_routine (__p=<optimized out>) at ../../../../../gcc-5.4.0/libstdc++-v3/src/c++11/thread.cc:84
#20 0x00007f101dcf26aa in start_thread (arg=0x7f10031ff700) at pthread_create.c:333
#21 0x00007f101da27eed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Comment by Dianna Hohensee (Inactive) [ 09/May/18 ]

This is the deadlock with the transaction:

[New Thread 0x7f0fe5113700 (LWP 29715)]
MongoDB Lock at 0x7f0fecd898e0 (mongo::MODE_IX) held by Idle Transaction (LockerId 1626) waited on by "clientcursormon" (Thread 0x7f10031ff700 (LWP 29276))
[Thread 0x7f0fe5215700 (LWP 29602) exited]
[New Thread 0x7f0fe5215700 (LWP 29716)]
MongoDB Lock at 0x7f0fecd898e0 (mongo::MODE_IX) held by Idle Transaction (LockerId 1626) waited on by "thread14" (Thread 0x7f10009fa700 (LWP 29281))
[Thread 0x7f0ffea87700 (LWP 29350) exited]
[New Thread 0x7f0ffea87700 (LWP 29717)]
[Thread 0x7f0fff309700 (LWP 29349) exited]
MongoDB Lock at 0x7f0fecd898e0 (mongo::MODE_IX) held by Idle Transaction (LockerId 1626) waited on by "conn23" (Thread 0x7f0fe7297700 (LWP 29598))

Comment by Dianna Hohensee (Inactive) [ 09/May/18 ]

I finally got it to reproduce by using Jeff's test and the Java driver.

Made SERVER-34917 another dependency for this ticket's resolution. It explains the deadlock depicted in the backtraces below. The PeriodicRunnerASIO has a task that runs every 5 minutes that takes an IS lock, which is blocking behind a drop cmd's X lock request, which is blocked on the inactive transaction's IX locks. The transaction aborter thread is waiting for the blocked PeriodicRunnerASIO task scheduled ahead of it.

This is the currentOp info, just like Jeff's repro

rs0:PRIMARY> db.currentOp()
{
	"inprog" : [
		{
			"host" : "delphi:27017",
			"desc" : "conn23",
			"connectionId" : 23,
			"client" : "127.0.0.1:37452",
			"clientMetadata" : {
				"driver" : {
					"name" : "mongo-java-driver",
					"version" : "3.8.0-beta2"
				},
				"os" : {
					"type" : "Linux",
					"name" : "Linux",
					"architecture" : "amd64",
					"version" : "4.2.0-30-generic"
				},
				"platform" : "Java/Oracle Corporation/1.8.0_91-8u91-b14-3ubuntu1~15.10.1-b14"
			},
			"active" : true,
			"currentOpTime" : "2018-05-08T15:26:49.401-0400",
			"opid" : 1432,
			"lsid" : {
				"id" : UUID("ce4abd5a-6109-4f37-9de5-94033ddda8e3"),
				"uid" : BinData(0,"47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=")
			},
			"secs_running" : NumberLong(96),
			"microsecs_running" : NumberLong(96692884),
			"op" : "command",
			"ns" : "test.$cmd",
			"command" : {
				"drop" : "drop",
				"$db" : "test",
				"$clusterTime" : {
					"clusterTime" : Timestamp(1525807512, 1),
					"signature" : {
						"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
						"keyId" : NumberLong(0)
					}
				},
				"lsid" : {
					"id" : UUID("ce4abd5a-6109-4f37-9de5-94033ddda8e3")
				}
			},
			"numYields" : 0,
			"locks" : {
				"Global" : "w",
				"Database" : "W"
			},
			"waitingForLock" : true,
			"lockStats" : {
				"Global" : {
					"acquireCount" : {
						"r" : NumberLong(1),
						"w" : NumberLong(1)
					}
				},
				"Database" : {
					"acquireCount" : {
						"W" : NumberLong(1)
					},
					"acquireWaitCount" : {
						"W" : NumberLong(1)
					},
					"timeAcquiringMicros" : {
						"W" : NumberLong(96499739)
					}
				}
			}
		},
		{
			"host" : "delphi:27017",
			"desc" : "rsSync-0",
			"active" : true,
			"currentOpTime" : "2018-05-08T15:26:49.402-0400",
			"opid" : 1962,
			"op" : "none",
			"ns" : "",
			"command" : {
 
			},
			"numYields" : 0,
			"locks" : {
 
			},
			"waitingForLock" : false,
			"lockStats" : {
				"Global" : {
					"acquireCount" : {
						"r" : NumberLong(2)
					}
				},
				"Database" : {
					"acquireCount" : {
						"r" : NumberLong(1)
					}
				},
				"Collection" : {
					"acquireCount" : {
						"r" : NumberLong(1)
					}
				}
			}
		},
		{
			"host" : "delphi:27017",
			"desc" : "NoopWriter",
			"active" : true,
			"currentOpTime" : "2018-05-08T15:26:49.402-0400",
			"opid" : 1909,
			"op" : "none",
			"ns" : "",
			"command" : {
 
			},
			"numYields" : 0,
			"locks" : {
 
			},
			"waitingForLock" : false,
			"lockStats" : {
 
			}
		},
		{
			"host" : "delphi:27017",
			"desc" : "conn1",
			"connectionId" : 1,
			"client" : "127.0.0.1:37388",
			"appName" : "MongoDB Shell",
			"clientMetadata" : {
				"application" : {
					"name" : "MongoDB Shell"
				},
				"driver" : {
					"name" : "MongoDB Internal Client",
					"version" : "3.7.7"
				},
				"os" : {
					"type" : "Linux",
					"name" : "Ubuntu",
					"architecture" : "x86_64",
					"version" : "15.10"
				}
			},
			"active" : true,
			"currentOpTime" : "2018-05-08T15:26:49.402-0400",
			"opid" : 1963,
			"secs_running" : NumberLong(0),
			"microsecs_running" : NumberLong(2241),
			"op" : "command",
			"ns" : "admin.$cmd.aggregate",
			"command" : {
				"currentOp" : 1,
				"$clusterTime" : {
					"clusterTime" : Timestamp(1525807594, 9),
					"signature" : {
						"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
						"keyId" : NumberLong(0)
					}
				},
				"$db" : "admin"
			},
			"numYields" : 0,
			"locks" : {
 
			},
			"waitingForLock" : false,
			"lockStats" : {
 
			}
		},
		{
			"host" : "delphi:27017",
			"desc" : "WT RecordStoreThread: local.oplog.rs",
			"active" : true,
			"currentOpTime" : "2018-05-08T15:26:49.402-0400",
			"opid" : 200,
			"secs_running" : NumberLong(299),
			"microsecs_running" : NumberLong(299153975),
			"op" : "none",
			"ns" : "local.oplog.rs",
			"command" : {
 
			},
			"numYields" : 0,
			"locks" : {
 
			},
			"waitingForLock" : false,
			"lockStats" : {
				"Global" : {
					"acquireCount" : {
						"r" : NumberLong(1),
						"w" : NumberLong(1)
					},
					"acquireWaitCount" : {
						"w" : NumberLong(1)
					},
					"timeAcquiringMicros" : {
						"w" : NumberLong(171940)
					}
				},
				"Database" : {
					"acquireCount" : {
						"w" : NumberLong(1)
					}
				},
				"oplog" : {
					"acquireCount" : {
						"w" : NumberLong(1)
					}
				}
			}
		},
		{
			"host" : "delphi:27017",
			"desc" : "ReplBatcher",
			"active" : true,
			"currentOpTime" : "2018-05-08T15:26:49.402-0400",
			"opid" : 1961,
			"op" : "none",
			"ns" : "",
			"command" : {
 
			},
			"numYields" : 0,
			"locks" : {
 
			},
			"waitingForLock" : false,
			"lockStats" : {
 
			}
		},
		{
			"host" : "delphi:27017",
			"desc" : "monitoring keys for HMAC",
			"active" : true,
			"currentOpTime" : "2018-05-08T15:26:49.402-0400",
			"opid" : 471,
			"op" : "none",
			"ns" : "",
			"command" : {
 
			},
			"numYields" : 0,
			"locks" : {
 
			},
			"waitingForLock" : false,
			"lockStats" : {
 
			}
		},
		{
			"host" : "delphi:27017",
			"desc" : "clientcursormon",
			"active" : true,
			"currentOpTime" : "2018-05-08T15:26:49.402-0400",
			"opid" : 1452,
			"op" : "none",
			"ns" : "",
			"command" : {
 
			},
			"numYields" : 0,
			"locks" : {
				"Global" : "r",
				"Database" : "r"
			},
			"waitingForLock" : true,
			"lockStats" : {
				"Global" : {
					"acquireCount" : {
						"r" : NumberLong(2)
					}
				},
				"Database" : {
					"acquireCount" : {
						"r" : NumberLong(1)
					},
					"acquireWaitCount" : {
						"r" : NumberLong(1)
					},
					"timeAcquiringMicros" : {
						"r" : NumberLong(93027432)
					}
				}
			}
		},
		{
			"host" : "delphi:27017",
			"desc" : "conn13",
			"connectionId" : 13,
			"client" : "127.0.0.1:37422",
			"clientMetadata" : {
				"driver" : {
					"name" : "NetworkInterfaceTL",
					"version" : "3.7.7"
				},
				"os" : {
					"type" : "Linux",
					"name" : "Ubuntu",
					"architecture" : "x86_64",
					"version" : "15.10"
				}
			},
			"active" : true,
			"currentOpTime" : "2018-05-08T15:26:49.402-0400",
			"opid" : 1916,
			"secs_running" : NumberLong(4),
			"microsecs_running" : NumberLong(4819571),
			"op" : "getmore",
			"ns" : "local.oplog.rs",
			"command" : {
				"getMore" : NumberLong("21888135713"),
				"collection" : "oplog.rs",
				"batchSize" : 13981010,
				"maxTimeMS" : NumberLong(5000),
				"term" : NumberLong(1),
				"lastKnownCommittedOpTime" : {
					"ts" : Timestamp(1525807594, 9),
					"t" : NumberLong(1)
				},
				"$replData" : 1,
				"$oplogQueryData" : 1,
				"$readPreference" : {
					"mode" : "secondaryPreferred"
				},
				"$clusterTime" : {
					"clusterTime" : Timestamp(1525807594, 9),
					"signature" : {
						"hash" : BinData(0,"Ps1aSrD0TTOV/ethB3tOGp+UE8Q="),
						"keyId" : NumberLong("6553292547987341314")
					}
				},
				"$db" : "local"
			},
			"originatingCommand" : {
				"find" : "oplog.rs",
				"filter" : {
					"ts" : {
						"$gte" : Timestamp(1525807310, 1)
					}
				},
				"tailable" : true,
				"oplogReplay" : true,
				"awaitData" : true,
				"maxTimeMS" : NumberLong(60000),
				"batchSize" : 13981010,
				"term" : NumberLong(1),
				"readConcern" : {
					"afterClusterTime" : Timestamp(1525807310, 1)
				},
				"$replData" : 1,
				"$oplogQueryData" : 1,
				"$readPreference" : {
					"mode" : "secondaryPreferred"
				},
				"$clusterTime" : {
					"clusterTime" : Timestamp(1525807323, 3),
					"signature" : {
						"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
						"keyId" : NumberLong(0)
					}
				},
				"$db" : "local"
			},
			"planSummary" : "COLLSCAN",
			"numYields" : 2,
			"locks" : {
 
			},
			"waitingForLock" : false,
			"lockStats" : {
				"Global" : {
					"acquireCount" : {
						"r" : NumberLong(2)
					}
				},
				"Database" : {
					"acquireCount" : {
						"r" : NumberLong(2)
					}
				},
				"oplog" : {
					"acquireCount" : {
						"r" : NumberLong(2)
					}
				}
			}
		},
		{
			"host" : "delphi:27017",
			"desc" : "PeriodicRunnerASIO",
			"active" : true,
			"currentOpTime" : "2018-05-08T15:26:49.402-0400",
			"opid" : 1833,
			"op" : "none",
			"ns" : "",
			"command" : {
 
			},
			"numYields" : 0,
			"locks" : {
				"Global" : "r",
				"Database" : "r"
			},
			"waitingForLock" : true,
			"lockStats" : {
				"Global" : {
					"acquireCount" : {
						"r" : NumberLong(23),
						"w" : NumberLong(19)
					}
				},
				"Database" : {
					"acquireCount" : {
						"r" : NumberLong(2),
						"w" : NumberLong(18),
						"W" : NumberLong(1)
					},
					"acquireWaitCount" : {
						"r" : NumberLong(1)
					},
					"timeAcquiringMicros" : {
						"r" : NumberLong(14499878)
					}
				},
				"Collection" : {
					"acquireCount" : {
						"r" : NumberLong(1),
						"w" : NumberLong(10)
					}
				},
				"oplog" : {
					"acquireCount" : {
						"w" : NumberLong(9)
					}
				}
			}
		},
		{
			"host" : "delphi:27017",
			"desc" : "inactive transaction",
			"lsid" : {
				"id" : UUID("76daaa8c-5c98-44fa-9181-ea3e168cab4a"),
				"uid" : BinData(0,"47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=")
			},
			"txnNumber" : NumberLong(1),
			"waitingForLock" : false,
			"active" : false,
			"locks" : {
				"Global" : "w",
				"Database" : "w",
				"Collection" : "w"
			},
			"lockStats" : {
				"Global" : {
					"acquireCount" : {
						"r" : NumberLong(2),
						"w" : NumberLong(1)
					}
				},
				"Database" : {
					"acquireCount" : {
						"r" : NumberLong(1),
						"w" : NumberLong(1)
					}
				},
				"Collection" : {
					"acquireCount" : {
						"r" : NumberLong(1),
						"w" : NumberLong(1)
					}
				}
			}
		}
	],
	"ok" : 1,
	"operationTime" : Timestamp(1525807594, 9),
	"$clusterTime" : {
		"clusterTime" : Timestamp(1525807594, 9),
		"signature" : {
			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
			"keyId" : NumberLong(0)
		}
	}
}

Comment by James Wahlin [ 02/May/18 ]

Created SERVER-34800 to change the transaction timeout mechanism to use a 0-second lock acquisition timeout.

Comment by Dianna Hohensee (Inactive) [ 02/May/18 ]

schwerin, I agree separate threads would be much safer in general. I'll write up a SERVER ticket later requesting such changes.

Comment by Spencer Brody (Inactive) [ 02/May/18 ]

Could we loop through all the transactions, aborting them and saving their cursors aside, then do a second pass to kill all cursors?

Even with that we could still get blocked by transactions that haven't timed out yet, so your suggestion james.wahlin is still probably a good idea.

Comment by James Wahlin [ 02/May/18 ]

The transaction abort periodic task will take locks when trying to kill a cursor that belongs to a collection cursor manager. This should occur after freeing transaction resources though, so I would expect the drop command to complete and the cursor cleanup to complete after. 

That said, I can see there being a problem if the drop command was blocked by more than one transaction. In that case we would free the first transaction's resources but would then block on cursor cleanup, which is blocked by the drop, which is blocked by the second transaction.

Maybe operations runs by the periodic transaction kill task should operate with the 0-second timeout? What I think we should do is:

  1. Loop through each transaction
  2. When a timed-out transaction is encountered:
    1. Abort the transaction, freeing locks and WiredTiger snapshot
    2. Attempt to kill associated cursors
    3. Move on to the next transaction, regardless of whether cursor cleanup was successful

In the above scenario, cursor cleanup would be left to the cursor timeout task. It would no longer be accessible by clients as the associated transaction was aborted.

Comment by Andy Schwerin [ 02/May/18 ]

Perhaps one of the other periodic jobs in PRASIO is taking the lock? I remain pretty suspicious of the design of PeriodicRunnerASIO, because it runs all the jobs in the same thread. For the number of jobs we have, I still think having a thread for each job and waking that thread up periodically is a much better solution.

Comment by Dianna Hohensee (Inactive) [ 02/May/18 ]

So, from the currentOp output, it actually looks like the PeriodicRunnerASIO and clientcursormon are both blocking behind the drop command waiting for the database X lock, which is stuck because the inactive transaction has the IX lock. The LockManager prioritizes strong lock (X) requests, blocking compatible IS lock requests behind it. PeriodicRunnerASIO is what runs the periodic task to abort transactions, so that is the problem here. The question becomes why/where PeriodicRunnerASIO is taking a lock, which isn't obvious. Shall explore that next.

Comment by Dianna Hohensee (Inactive) [ 01/May/18 ]

Thanks for linking that, shane.harvey! I was just looking at and wondering about the three operations that have

"waitingForLock" : true 

in the attached currentOp.txt. One of them is "clientcursormon", so it looks likely.

Comment by Shane Harvey [ 01/May/18 ]

This could be related to SERVER-34781.

Comment by Jeffrey Yemin [ 01/May/18 ]

Pretty consistently on the second run.

Comment by Dianna Hohensee (Inactive) [ 01/May/18 ]

Awesome, thanks! And how frequently does the testing hang for you?

Comment by Jeffrey Yemin [ 30/Apr/18 ]

Here's the Java program I was using to reproduce this: https://gist.github.com/jyemin/c5876c1d1503aa6e3790d6fb5ab34267

Differences I see:

  1. You're setting priority to 0 for the other members
  2. You're setting causal to false
  3. You're not dropping the collection at the start of each run
  4. You're creating the collection by inserting a document rather than calling the create command
  5. You're running it in a tight loop. I'm doing one run at a time and starting the test program manually (this is adding additional indeterminacy to my repro attempts).
  6. You're setting a shorter transactionLifetimeLimitSeconds

Here's the JSON representations of the commands and responses that the Java driver is actually sending in the repro script:

{ "drop" : "drop", "$db" : "test", "$clusterTime" : { "clusterTime" : { "$timestamp" : { "t" : 1525127680, "i" : 1 } }, "signature" : { "hash" : { "$binary" : "AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "$type" : "00" }, "keyId" : { "$numberLong" : "0" } } }, "lsid" : { "id" : { "$binary" : "drTHQ6WQS6O565azrq96tw==", "$type" : "04" } } }
Command failed with error 26: 'ns not found' on server localhost:27017. The full response is { "ok" : 0.0, "errmsg" : "ns not found", "code" : 26, "codeName" : "NamespaceNotFound", "operationTime" : { "$timestamp" : { "t" : 1525127680, "i" : 1 } }, "$clusterTime" : { "clusterTime" : { "$timestamp" : { "t" : 1525127680, "i" : 1 } }, "signature" : { "hash" : { "$binary" : "AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "$type" : "00" }, "keyId" : { "$numberLong" : "0" } } } }
 
Create collection
{ "create" : "drop", "capped" : false, "$db" : "test", "$clusterTime" : { "clusterTime" : { "$timestamp" : { "t" : 1525127680, "i" : 1 } }, "signature" : { "hash" : { "$binary" : "AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "$type" : "00" }, "keyId" : { "$numberLong" : "0" } } }, "lsid" : { "id" : { "$binary" : "drTHQ6WQS6O565azrq96tw==", "$type" : "04" } } }
{ "ok" : 1.0, "operationTime" : { "$timestamp" : { "t" : 1525127680, "i" : 2 } }, "$clusterTime" : { "clusterTime" : { "$timestamp" : { "t" : 1525127680, "i" : 2 } }, "signature" : { "hash" : { "$binary" : "AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "$type" : "00" }, "keyId" : { "$numberLong" : "0" } } } }
 
Insert document
{ "insert" : "drop", "ordered" : true, "$db" : "test", "$clusterTime" : { "clusterTime" : { "$timestamp" : { "t" : 1525127680, "i" : 2 } }, "signature" : { "hash" : { "$binary" : "AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "$type" : "00" }, "keyId" : { "$numberLong" : "0" } } }, "lsid" : { "id" : { "$binary" : "drTHQ6WQS6O565azrq96tw==", "$type" : "04" } }, "txnNumber" : { "$numberLong" : "1" }, "stmtId" : 0, "startTransaction" : true, "autocommit" : false, "documents" : [{ "_id" : { "$oid" : "5ae79a0086ee727915f59f93" } }] }
{ "n" : 1, "opTime" : { "ts" : { "$timestamp" : { "t" : 1525127680, "i" : 2 } }, "t" : { "$numberLong" : "1" } }, "electionId" : { "$oid" : "7fffffff0000000000000001" }, "ok" : 1.0, "operationTime" : { "$timestamp" : { "t" : 1525127680, "i" : 2 } }, "$clusterTime" : { "clusterTime" : { "$timestamp" : { "t" : 1525127680, "i" : 2 } }, "signature" : { "hash" : { "$binary" : "AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "$type" : "00" }, "keyId" : { "$numberLong" : "0" } } } }
 
Drop collection
{ "drop" : "drop", "$db" : "test", "$clusterTime" : { "clusterTime" : { "$timestamp" : { "t" : 1525127680, "i" : 2 } }, "signature" : { "hash" : { "$binary" : "AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "$type" : "00" }, "keyId" : { "$numberLong" : "0" } } }, "lsid" : { "id" : { "$binary" : "2LSN5415Q72I3xUiPD1yPQ==", "$type" : "04" } } }

Comment by Dianna Hohensee (Inactive) [ 30/Apr/18 ]

jeff.yemin How frequently did this failure reproduce for you? I've written a JS test here https://gist.github.com/DiannaHohensee/5e18eaa05e45c075cc1cdbb95cf8e3a1, but it isn't readily failing: could you take a look at my repro and check that the scenario is correct?

Or if you wrote up your repro, I'd love to take a look at it.

Thanks!

Comment by Jeffrey Yemin [ 30/Apr/18 ]

No, if the previous run was successful I didn't touch the replica set. A new session is created for each test run. The hang is longer than two minutes.

Comment by Dianna Hohensee (Inactive) [ 30/Apr/18 ]

jeff.yemin Is a new replica set started up for each test run? If the same replica set if left running, is a new session created for each test run or is the same session used? I'm currently trying to reproduce it.

And I presume the hang is longer than 2 minutes? Because the code to abort expired transactions runs every 1 minute by default, so it can take up to 2 minutes if a transaction expires with just the right timing.

Thanks.

Generated at Thu Feb 08 04:37:40 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.