Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-28746

eval can deadlock on mmap flush lock when waiting for writeConcern

    • Type: Icon: Bug Bug
    • Resolution: Won't Fix
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.4.3, 3.5.5
    • Component/s: MMAPv1, Storage
    • Labels:
    • Storage Execution
    • ALL
    • Hide

      Before running this test, you need to uncomment it. It was disabled with SERVER-23728.

      buildscripts/resmoke.py --executor=replica_sets_jscore_passthrough jstests/core/eval_wait_for_read_write_concern.js --storageEngine=mmapv1
      
      Show
      Before running this test, you need to uncomment it. It was disabled with SERVER-23728 . buildscripts/resmoke.py --executor=replica_sets_jscore_passthrough jstests/core/eval_wait_for_read_write_concern.js --storageEngine=mmapv1
    • 0

      This deadlock was discovered after enabling the test jstests/core/eval_wait_for_read_write_concern.js. It was originally committed by kaloian.manassiev with SERVER-25468 but never enabled. I believe this test was added as a regression test for a previous deadlock.

      After analyzing the following files from the hang_analyzer.py script (debugger_mongod_4492.log, debugger_waitsfor_mongod_4492.gv, debugger_mongod_4492_raw_stacks.log), I'm able to piece together the deadlock cycle.

      LWP 11315 is running some dbEval code. Based on the waitsfor graph in debugger_waitsfor_mongod_4492.gv, it appears to be holding the lock 0x7fb0cc63dc80.

      Thread 64 (Thread 0x7fb0c8c4d700 (LWP 11315)):
      #0  0x00007fb0c622368c in pthread_cond_wait@@GLIBC_2.3.2 ()
         from /lib64/libpthread.so.0
      #1  0x00007fb0cadca1bc in __gthread_cond_wait (__mutex=<optimized out>, 
          __cond=__cond@entry=0x7fb0d02b6740)
          at /data/mci/9888f91d5702e71f5ec369c08a52cfa3/toolchain-builder/build-gcc-v2.sh-kv4/x86_64-mongodb-linux/libstdc++-v3/include/x86_64-mongodb-linux/bits/gthr-default.h:864
      #2  std::condition_variable::wait (this=this@entry=0x7fb0d02b6740, __lock=...)
          at ../../../../../gcc-5.4.0/libstdc++-v3/src/c++11/condition_variable.cc:53
      #3  0x00007fb0ca2a04a3 in std::condition_variable::wait<mongo::mozjs::MozJSProxyScope::runOnImplThread(std::function<void()>)::<lambda()> > (__p=..., 
          __lock=..., this=0x7fb0d02b6740)
          at /opt/mongodbtoolchain/v2/include/c++/5.4.0/condition_variable:98
      #4  mongo::mozjs::MozJSProxyScope::runOnImplThread(std::function<void ()>) (
          this=this@entry=0x7fb0d02b6640, f=...)
          at src/mongo/scripting/mozjs/proxyscope.cpp:286
      #5  0x00007fb0ca2a2770 in mongo::mozjs::MozJSProxyScope::run<mongo::mozjs::MozJSProxyScope::invoke(mongo::ScriptingFunction, const mongo::BSONObj*, const mongo::BSONObj*, int, bool, bool, bool)::<lambda()> > (closure=<optimized out>, 
          this=0x7fb0d02b6640) at src/mongo/scripting/mozjs/proxyscope.cpp:274
      #6  mongo::mozjs::MozJSProxyScope::invoke (this=0x7fb0d02b6640, func=1, 
          argsObject=0x7fb0c8c4a510, recv=0x0, timeoutMs=0, ignoreReturn=false, 
          readOnlyArgs=false, readOnlyRecv=false)
          at src/mongo/scripting/mozjs/proxyscope.cpp:212
      #7  0x00007fb0c978df1e in mongo::(anonymous namespace)::dbEval (
          opCtx=opCtx@entry=0x7fb0d02b3960, dbName=..., cmd=..., result=..., 
          errmsg=...) at src/mongo/db/commands/eval.cpp:124
      #8  0x00007fb0c978ea47 in mongo::(anonymous namespace)::CmdEval::run (
          this=<optimized out>, opCtx=0x7fb0d02b3960, dbname=..., cmdObj=..., 
          options=<optimized out>, errmsg=..., result=...)
          at src/mongo/db/commands/eval.cpp:188
      #9  0x00007fb0c9776068 in mongo::Command::run (
          this=this@entry=0x7fb0cbae4d60 <mongo::(anonymous namespace)::cmdeval>, 
          opCtx=opCtx@entry=0x7fb0d02b3960, request=..., 
          replyBuilder=replyBuilder@entry=0x7fb0c8c4b5e0)
          at src/mongo/db/commands/dbcommands.cpp:1450
      #10 0x00007fb0c9777e29 in mongo::execCommandDatabase (opCtx=<optimized out>, 
          command=<optimized out>, request=..., replyBuilder=<optimized out>)
          at src/mongo/db/commands/dbcommands.cpp:1693
      #11 0x00007fb0c974491c in std::function<void (mongo::OperationContext*, mongo::Command*, mongo::rpc::RequestInterface const&, mongo::rpc::ReplyBuilderInterface*)>::operator()(mongo::OperationContext*, mongo::Command*, mongo::rpc::RequestInterface const&, mongo::rpc::ReplyBuilderInterface*) const (
          __args#3=0x7fb0c8c4b5e0, __args#2=..., 
          __args#1=0x7fb0cbae4d60 <mongo::(anonymous namespace)::cmdeval>, 
          __args#0=0x7fb0d02b3960, 
          this=0x7fb0cbae25e0 <mongo::(anonymous namespace)::execCommandHandler>)
          at /opt/mongodbtoolchain/v2/include/c++/5.4.0/functional:2267
      #12 mongo::Command::execCommand (opCtx=opCtx@entry=0x7fb0d02b3960, 
          command=command@entry=0x7fb0cbae4d60 <mongo::(anonymous namespace)::cmdeval>, request=..., replyBuilder=replyBuilder@entry=0x7fb0c8c4b5e0)
          at src/mongo/db/commands.cpp:452
      #13 0x00007fb0c9e60738 in mongo::runCommands (
          opCtx=opCtx@entry=0x7fb0d02b3960, request=..., 
          replyBuilder=replyBuilder@entry=0x7fb0c8c4b5e0)
          at src/mongo/db/run_commands.cpp:73
      #14 0x00007fb0c966ca0a in mongo::(anonymous namespace)::receivedRpc (
          client=..., message=..., dbResponse=..., opCtx=0x7fb0d02b3960)
          at src/mongo/db/assemble_response.cpp:220
      #15 mongo::assembleResponse (opCtx=0x7fb0d02b3960, m=..., dbresponse=..., 
          remote=...) at src/mongo/db/assemble_response.cpp:558
      #16 0x00007fb0c958735d in mongo::ServiceEntryPointMongod::_sessionLoop (
          this=this@entry=0x7fb0cc73f9a0, session=...)
          at src/mongo/db/service_entry_point_mongod.cpp:139
      #17 0x00007fb0c9587c3d in mongo::ServiceEntryPointMongod::<lambda(const SessionHandle&)>::operator() (session=..., __closure=<optimized out>)
          at src/mongo/db/service_entry_point_mongod.cpp:104
      #18 std::_Function_handler<void(const std::shared_ptr<mongo::transport::Session>&), mongo::ServiceEntryPointMongod::startSession(mongo::transport::SessionHandle)::<lambda(const SessionHandle&)> >::_M_invoke(const std::_Any_data &, const std::shared_ptr<mongo::transport::Session> &) (__functor=..., __args#0=...)
          at /opt/mongodbtoolchain/v2/include/c++/5.4.0/functional:1871
      #19 0x00007fb0ca2bfc64 in std::function<void (std::shared_ptr<mongo::transport::Session> const&)>::operator()(std::shared_ptr<mongo::transport::Session> const&) const (__args#0=..., this=0x7fb0cdf18a80)
          at /opt/mongodbtoolchain/v2/include/c++/5.4.0/functional:2267
      #20 mongo::(anonymous namespace)::runFunc (ptr=0x7fb0cdf18a70)
          at src/mongo/transport/service_entry_point_utils.cpp:78
      #21 0x00007fb0c621faa1 in start_thread () from /lib64/libpthread.so.0
      #22 0x00007fb0c5f6cbbd in clone () from /lib64/libc.so.6
      

      It spawns the Implementation thread LWP 11318 and blocks waiting for this thread to complete. LWP 11318 performs the write operation and blocks for the write concern to complete.

      Thread 65 (Thread 0x7fb03ca87700 (LWP 11318)):
      #0  0x00007fb0c622368c in pthread_cond_wait@@GLIBC_2.3.2 ()
         from /lib64/libpthread.so.0
      #1  0x00007fb0cadca1bc in __gthread_cond_wait (__mutex=<optimized out>, 
          __cond=__cond@entry=0x7fb0cbb011a8 <mongo::dur::(anonymous namespace)::commitNotify+40>)
          at /data/mci/9888f91d5702e71f5ec369c08a52cfa3/toolchain-builder/build-gcc-v2.sh-kv4/x86_64-mongodb-linux/libstdc++-v3/include/x86_64-mongodb-linux/bits/gthr-default.h:864
      #2  std::condition_variable::wait (
          this=this@entry=0x7fb0cbb011a8 <mongo::dur::(anonymous namespace)::commitNotify+40>, __lock=...)
          at ../../../../../gcc-5.4.0/libstdc++-v3/src/c++11/condition_variable.cc:53
      #3  0x00007fb0c9fb4e6b in mongo::CommitNotifier::awaitBeyondNow (
          this=this@entry=0x7fb0cbb01180 <mongo::dur::(anonymous namespace)::commitNotify>) at src/mongo/db/storage/mmap_v1/commit_notifier.cpp:61
      #4  0x00007fb0c9fb8c00 in mongo::dur::(anonymous namespace)::DurableImpl::waitUntilDurable (this=<optimized out>) at src/mongo/db/storage/mmap_v1/dur.cpp:538
      #5  0x00007fb0ca0826a4 in mongo::waitForWriteConcern (opCtx=0x7fb0d02b3960, 
          replOpTime=..., writeConcern=..., result=result@entry=0x7fb03ca83320)
          at src/mongo/db/write_concern.cpp:214
      #6  0x00007fb0c977657a in mongo::Command::run (
          this=this@entry=0x7fb0cbae9640 <mongo::cmdInsert>, 
          opCtx=opCtx@entry=0x7fb0d02b3960, request=..., 
          replyBuilder=replyBuilder@entry=0x7fb03ca83b60)
          at src/mongo/db/commands/dbcommands.cpp:1476
      #7  0x00007fb0c9777e29 in mongo::execCommandDatabase (opCtx=<optimized out>, 
          command=<optimized out>, request=..., replyBuilder=<optimized out>)
          at src/mongo/db/commands/dbcommands.cpp:1693
      #8  0x00007fb0c974491c in std::function<void (mongo::OperationContext*, mongo::Command*, mongo::rpc::RequestInterface const&, mongo::rpc::ReplyBuilderInterface*)>::operator()(mongo::OperationContext*, mongo::Command*, mongo::rpc::RequestInterface const&, mongo::rpc::ReplyBuilderInterface*) const (
          __args#3=0x7fb03ca83b60, __args#2=..., 
          __args#1=0x7fb0cbae9640 <mongo::cmdInsert>, __args#0=0x7fb0d02b3960, 
          this=0x7fb0cbae25e0 <mongo::(anonymous namespace)::execCommandHandler>)
          at /opt/mongodbtoolchain/v2/include/c++/5.4.0/functional:2267
      #9  mongo::Command::execCommand (opCtx=opCtx@entry=0x7fb0d02b3960, 
          command=command@entry=0x7fb0cbae9640 <mongo::cmdInsert>, request=..., 
          replyBuilder=replyBuilder@entry=0x7fb03ca83b60)
          at src/mongo/db/commands.cpp:452
      #10 0x00007fb0c9e60738 in mongo::runCommands (
          opCtx=opCtx@entry=0x7fb0d02b3960, request=..., 
          replyBuilder=replyBuilder@entry=0x7fb03ca83b60)
          at src/mongo/db/run_commands.cpp:73
      #11 0x00007fb0c9669fd2 in mongo::(anonymous namespace)::receivedCommand (
          opCtx=opCtx@entry=0x7fb0d02b3960, nss=..., dbResponse=..., message=..., 
          client=...) at src/mongo/db/assemble_response.cpp:180
      #12 0x00007fb0c966c41e in mongo::assembleResponse (opCtx=0x7fb0d02b3960, 
          m=..., dbresponse=..., remote=...)
          at src/mongo/db/assemble_response.cpp:553
      #13 0x00007fb0c986aa1f in mongo::DBDirectClient::call (this=0x7fb0d781c800, 
          toSend=..., response=..., assertOk=<optimized out>, 
          actualServer=<optimized out>) at src/mongo/db/dbdirectclient.cpp:128
      #14 0x00007fb0c95fd938 in mongo::DBClientWithCommands::runCommandWithMetadata
          (this=0x7fb0d781c800, database=..., command=..., metadata=..., 
          commandArgs=...) at src/mongo/client/dbclient.cpp:205
      #15 0x00007fb0c95f8df3 in mongo::DBClientWithCommands::runCommandWithMetadataAndTarget (this=this@entry=0x7fb0d781c800, database=..., command=..., 
          metadata=..., commandArgs=...) at src/mongo/client/dbclient.cpp:244
      #16 0x00007fb0c95f9389 in mongo::DBClientWithCommands::runCommandWithTarget (
          this=0x7fb0d781c800, dbname=..., cmd=..., info=..., 
          options=<optimized out>) at src/mongo/client/dbclient.cpp:261
      #17 0x00007fb0ca27fe5d in mongo::mozjs::MongoBase::Functions::runCommand::call
          (cx=cx@entry=0x7fb0cca17580, args=...)
          at src/mongo/scripting/mozjs/mongo.cpp:207
      #18 0x00007fb0ca28937d in mongo::mozjs::smUtils::wrapConstrainedMethod<mongo::mozjs::MongoBase::Functions::runCommand, true, mongo::mozjs::MongoLocalInfo, mongo::mozjs::MongoExternalInfo> (cx=cx@entry=0x7fb0cca17580, argc=3, 
          vp=0x7fb0d7796220)
          at src/mongo/scripting/mozjs/wrapconstrainedmethod.h:116
      #19 0x00007fb0caa8bb2b in js::CallJSNative (args=..., native=<optimized out>, 
          cx=0x7fb0cca17580)
          at src/third_party/mozjs-45/extract/js/src/jscntxtinlines.h:240
      #20 js::Invoke (cx=0x7fb0cca17580, args=..., construct=<optimized out>)
          at src/third_party/mozjs-45/extract/js/src/vm/Interpreter.cpp:444
      #21 0x00007fb0caa874d1 in Interpret (cx=0x7fb0cca17580, state=...)
          at src/third_party/mozjs-45/extract/js/src/vm/Interpreter.cpp:2766
      #22 0x00007fb0caa8b8a3 in js::RunScript (cx=cx@entry=0x7fb0cca17580, 
          state=...)
          at src/third_party/mozjs-45/extract/js/src/vm/Interpreter.cpp:391
      #23 0x00007fb0caa8bc12 in js::Invoke (cx=cx@entry=0x7fb0cca17580, args=..., 
          construct=construct@entry=js::NO_CONSTRUCT)
          at src/third_party/mozjs-45/extract/js/src/vm/Interpreter.cpp:462
      #24 0x00007fb0caa8c5b0 in js::Invoke (cx=0x7fb0cca17580, thisv=..., fval=..., 
          argc=0, argv=<optimized out>, rval=...)
          at src/third_party/mozjs-45/extract/js/src/vm/Interpreter.cpp:496
      #25 0x00007fb0ca93a882 in JS_CallFunctionValue (cx=<optimized out>, obj=..., 
          obj@entry=..., fval=..., fval@entry=..., args=..., rval=..., 
          rval@entry=...) at src/third_party/mozjs-45/extract/js/src/jsapi.cpp:2790
      #26 0x00007fb0ca271e1c in JS::Call (rval=..., args=..., fun=..., thisObj=..., 
          cx=<optimized out>) at src/third_party/mozjs-45/include/jsapi.h:3179
      #27 mongo::mozjs::MozJSImplScope::invoke (this=0x7fb0d0318400, 
          func=<optimized out>, argsObject=<optimized out>, recv=0x0, timeoutMs=0, 
          ignoreReturn=<optimized out>, readOnlyArgs=false, readOnlyRecv=false)
          at src/mongo/scripting/mozjs/implscope.cpp:672
      #28 0x00007fb0ca2a2875 in mongo::mozjs::MozJSProxyScope::<lambda()>::operator() (__closure=<optimized out>) at src/mongo/scripting/mozjs/proxyscope.cpp:210
      #29 std::_Function_handler<void(), mongo::mozjs::MozJSProxyScope::invoke(mongo::ScriptingFunction, const mongo::BSONObj*, const mongo::BSONObj*, int, bool, bool, bool)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...)
          at /opt/mongodbtoolchain/v2/include/c++/5.4.0/functional:1871
      #30 0x00007fb0ca2a2e63 in std::function<void ()>::operator()() const (
          this=0x7fb0d02b6710)
          at /opt/mongodbtoolchain/v2/include/c++/5.4.0/functional:2267
      #31 mongo::mozjs::MozJSProxyScope::implThread (arg=0x7fb0d02b6640)
          at src/mongo/scripting/mozjs/proxyscope.cpp:355
      #32 0x00007fb0ca256e4c in nspr::Thread::ThreadRoutine (arg=0x7fb0cc8767e0)
          at src/mongo/scripting/mozjs/PosixNSPR.cpp:57
      #33 0x00007fb0cadccdd0 in std::execute_native_thread_routine (
          __p=<optimized out>)
          at ../../../../../gcc-5.4.0/libstdc++-v3/src/c++11/thread.cc:84
      #34 0x00007fb0c621faa1 in start_thread () from /lib64/libpthread.so.0
      #35 0x00007fb0c5f6cbbd in clone () from /lib64/libc.so.6
      

      Meanwhile, the durability thread LWP 4499 spawns the journal writer thread LWP 4500. The journal writer thread is blocking waiting for an in-memory journal buffer on the _journalQueue.
      No journal buffer is pushed on the queue because the durability thread 4499 is blocked waiting to acquire the mmapv1 flush lock. I am not sure where the flush lock is acquired by 11315.

      Thread 6 (Thread 0x7fb0bd9d7700 (LWP 4499)):
      #0  0x00007fb0c6223a5e in pthread_cond_timedwait@@GLIBC_2.3.2 ()
         from /lib64/libpthread.so.0
      #1  0x00007fb0c9852918 in __gthread_cond_timedwait (
          __abs_timeout=0x7fb0bd9d4f20, __mutex=<optimized out>, 
          __cond=0x7fb0cc8b99d8)
          at /opt/mongodbtoolchain/v2/include/c++/5.4.0/x86_64-mongodb-linux/bits/gthr-default.h:871
      #2  std::condition_variable::__wait_until_impl<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (__atime=..., __lock=<synthetic pointer>..., 
          this=0x7fb0cc8b99d8)
          at /opt/mongodbtoolchain/v2/include/c++/5.4.0/condition_variable:165
      #3  std::condition_variable::wait_until<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (__atime=..., __lock=<synthetic pointer>..., 
          this=0x7fb0cc8b99d8)
          at /opt/mongodbtoolchain/v2/include/c++/5.4.0/condition_variable:105
      #4  std::condition_variable::wait_for<long, std::ratio<1l, 1000000000l> > (
          __rtime=..., __lock=<synthetic pointer>..., this=0x7fb0cc8b99d8)
          at /opt/mongodbtoolchain/v2/include/c++/5.4.0/condition_variable:137
      #5  mongo::CondVarLockGrantNotification::wait (
          this=this@entry=0x7fb0cc8b99a8, timeoutMs=timeoutMs@entry=500)
          at src/mongo/db/concurrency/lock_state.cpp:227
      #6  0x00007fb0c985701e in mongo::LockerImpl<true>::lockComplete (
          this=0x7fb0cc8b9400, resId=..., mode=<optimized out>, 
          timeoutMs=4294967295, checkDeadlock=true)
          at src/mongo/db/concurrency/lock_state.cpp:744
      #7  0x00007fb0c9852b36 in mongo::AutoAcquireFlushLockForMMAPV1Commit::AutoAcquireFlushLockForMMAPV1Commit (this=0x7fb0bd9d5340, locker=<optimized out>)
          at src/mongo/db/concurrency/lock_state.cpp:880
      #8  0x00007fb0c9fba0df in mongo::dur::durThread (cs=0x7fb0cc8569a0, 
          serverStartMs=1491439819653) at src/mongo/db/storage/mmap_v1/dur.cpp:731
      #9  0x00007fb0cadccdd0 in std::execute_native_thread_routine (
          __p=<optimized out>)
          at ../../../../../gcc-5.4.0/libstdc++-v3/src/c++11/thread.cc:84
      #10 0x00007fb0c621faa1 in start_thread () from /lib64/libpthread.so.0
      #11 0x00007fb0c5f6cbbd in clone () from /lib64/libc.so.6
      
      Thread 7 (Thread 0x7fb0bcfd6700 (LWP 4500)):
      #0  0x00007fb0c622368c in pthread_cond_wait@@GLIBC_2.3.2 ()
         from /lib64/libpthread.so.0
      #1  0x00007fb0cadca1bc in __gthread_cond_wait (__mutex=<optimized out>, 
          __cond=__cond@entry=0x7fb0bd9d6650)
          at /data/mci/9888f91d5702e71f5ec369c08a52cfa3/toolchain-builder/build-gcc-v2.sh-kv4/x86_64-mongodb-linux/libstdc++-v3/include/x86_64-mongodb-linux/bits/gthr-default.h:864
      #2  std::condition_variable::wait (this=this@entry=0x7fb0bd9d6650, __lock=...)
          at ../../../../../gcc-5.4.0/libstdc++-v3/src/c++11/condition_variable.cc:53
      #3  0x00007fb0c9fcb823 in mongo::BlockingQueue<mongo::dur::JournalWriter::Buffer*>::blockingPop (this=0x7fb0bd9d6570) at src/mongo/util/queue.h:172
      #4  mongo::dur::JournalWriter::<lambda()>::operator() (
          __closure=<optimized out>)
          at src/mongo/db/storage/mmap_v1/dur_journal_writer.cpp:219
      #5  mongo::dur::JournalWriter::_journalWriterThread (this=0x7fb0bd9d6550)
          at src/mongo/db/storage/mmap_v1/dur_journal_writer.cpp:220
      #6  0x00007fb0cadccdd0 in std::execute_native_thread_routine (
          __p=<optimized out>)
          at ../../../../../gcc-5.4.0/libstdc++-v3/src/c++11/thread.cc:84
      #7  0x00007fb0c621faa1 in start_thread () from /lib64/libpthread.so.0
      #8  0x00007fb0c5f6cbbd in clone () from /lib64/libc.so.6
      

      Supposedly the durability thread once it gets the flush lock should write a journal buffer to the _journalQueue at which point the journal writer would pop the buffer off the queue and persist it to the journal. The journal writer then notifies the Implementation thread that is waiting in waitForWriteConcern.

      The cycle is thus:
      11315 (holds flush lock) -----> waits for 11318 -----> 11318 waits for write concern
      4500 (blocks on _journalQueue) ----> can't notify thread 11318
      4499 (blocking for flush lock) ----> can't write buffer to _journalQueue

      Seems like the 11315 thread should be yielding the flush lock somewhere before waiting for the Implementation thread to complete the wait on write concern.

      One other thought brought up was whether this test case should be placed in the jstests/replsets directory as opposed to jstests/core.

        1. debugger_mongod_4492_raw_stacks.log
          1.45 MB
        2. debugger_mongod_4492.log
          131 kB
        3. debugger_waitsfor_mongod_4492.gv
          23 kB

            Assignee:
            backlog-server-execution [DO NOT USE] Backlog - Storage Execution Team
            Reporter:
            eddie.louie Eddie Louie
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: