Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-28188

Deadlock involving ViewCatalog mutex on MMAPv1, when system.views write with $where is executed in parallel with collection reads

    • Type: Icon: Bug Bug
    • Resolution: Won't Fix
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.5.3
    • Component/s: MMAPv1
    • Storage Execution
    • ALL
    • Hide

      I tried to minimize the steps required to reproduce this hang, but was not successful. It seems like running two threads in parallel on a single mongod does not readily reproduce this hang, even though it appears like only 2 threads are required to generate this deadlock. However, running this in a sharded cluster on mmapv1 with journaling does reproduce it consistently if I use the original fuzzer generated file that I've attached.

      Show
      I tried to minimize the steps required to reproduce this hang, but was not successful. It seems like running two threads in parallel on a single mongod does not readily reproduce this hang, even though it appears like only 2 threads are required to generate this deadlock. However, running this in a sharded cluster on mmapv1 with journaling does reproduce it consistently if I use the original fuzzer generated file that I've attached.
    • Query 2017-04-17
    • 0

      I found the following two relevant stacks:

      Thread 68 (Thread 0x7f4b2bfdd700 (LWP 27095)):
      #0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
      #1  0x00007f4b66f0edbd in __GI___pthread_mutex_lock (mutex=0x55a85f914d38) at ../nptl/pthread_mutex_lock.c:80
      #2  0x000055a85643e372 in __gthread_mutex_lock (__mutex=0x55a85f914d38) at /usr/include/x86_64-linux-gnu/c++/5/bits/gthr-default.h:748
      #3  std::mutex::lock (this=0x55a85f914d38) at /usr/include/c++/5/mutex:135
      #4  std::lock_guard<std::mutex>::lock_guard (__m=..., this=<synthetic pointer>) at /usr/include/c++/5/mutex:386
      #5  mongo::ViewCatalog::lookup (this=0x55a85f914d38, txn=txn@entry=0x55a85feb80e0, ns=...) at src/mongo/db/views/view_catalog.cpp:376
      #6  0x000055a855cdf92f in mongo::AutoGetCollectionOrViewForRead::AutoGetCollectionOrViewForRead (this=0x7f4b2bfd6c80, txn=0x55a85feb80e0, nss=...) at src/mongo/db/db_raii.cpp:164
      #7  0x000055a855fe54ac in mongo::runQuery[abi:cxx11](mongo::OperationContext*, mongo::QueryMessage&, mongo::NamespaceString const&, mongo::Message&) (txn=txn@entry=0x55a85feb80e0, q=..., nss=..., result=...) at src/mongo/db/query/find.cpp:530
      #8  0x000055a855ae4a79 in mongo::(anonymous namespace)::receivedQuery (c=..., m=..., dbResponse=..., nss=..., txn=0x55a85feb80e0) at src/mongo/db/assemble_response.cpp:312
      #9  mongo::assembleResponse (txn=0x55a85feb80e0, m=..., dbresponse=..., remote=...) at src/mongo/db/assemble_response.cpp:551
      #10 0x000055a855cdfcfc in mongo::DBDirectClient::call (this=0x55a8610b2f00, toSend=..., response=..., assertOk=<optimized out>, actualServer=<optimized out>) at src/mongo/db/dbdirectclient.cpp:128
      #11 0x000055a855a7e72e in mongo::DBClientCursor::init (this=this@entry=0x55a860103800) at src/mongo/client/dbclientcursor.cpp:145
      #12 0x000055a855a5f0c6 in mongo::DBClientBase::query (this=0x55a8610b2f00, ns="test.system.js", query=..., nToReturn=0, nToSkip=0, fieldsToReturn=0x0, queryOptions=4, batchSize=0) at src/mongo/client/dbclient.cpp:1020
      #13 0x000055a855cdfb60 in mongo::DBDirectClient::query (this=<optimized out>, ns=..., query=..., nToReturn=<optimized out>, nToSkip=<optimized out>, fieldsToReturn=0x0, queryOptions=4, batchSize=0) at src/mongo/db/dbdirectclient.cpp:152
      #14 0x000055a85661352c in mongo::Scope::loadStored (this=0x55a85fb55640, txn=0x55a85feb80e0, ignoreNotConnected=<optimized out>) at src/mongo/scripting/engine.cpp:222
      #15 0x000055a85661265f in mongo::ScriptEngine::getPooledScope (this=0x55a85912c040, txn=txn@entry=0x55a85feb80e0, db="test", scopeType="where") at src/mongo/scripting/engine.cpp:548
      #16 0x000055a855e4d1cc in mongo::WhereMatchExpression::init (this=this@entry=0x55a8612a1580, dbName=...) at src/mongo/db/matcher/expression_where.cpp:75
      #17 0x000055a855e4d4ad in mongo::WhereMatchExpression::shallowClone (this=0x55a8612a0780) at src/mongo/db/matcher/expression_where.cpp:116
      #18 0x000055a856044fa7 in mongo::QueryPlannerAccess::makeCollectionScan (query=..., tailable=tailable@entry=false, params=...) at src/mongo/db/query/planner_access.cpp:150
      #19 0x000055a856057996 in mongo::buildCollscanSoln (query=..., tailable=tailable@entry=false, params=...) at src/mongo/db/query/query_planner.cpp:243
      #20 0x000055a85605d927 in mongo::QueryPlanner::plan (query=..., params=..., out=out@entry=0x7f4b2bfd9a60) at src/mongo/db/query/query_planner.cpp:956
      #21 0x000055a855ff1b0c in mongo::(anonymous namespace)::prepareExecution (opCtx=opCtx@entry=0x55a85feb80e0, collection=collection@entry=0x55a8612d7900, ws=0x55a8603cac80, canonicalQuery=std::unique_ptr<mongo::CanonicalQuery> containing 0x55a8607b70e0, plannerOptions=plannerOptions@entry=0) at src/mongo/db/query/get_executor.cpp:378
      #22 0x000055a855ff5c88 in mongo::getExecutorDelete (txn=txn@entry=0x55a85feb80e0, opDebug=opDebug@entry=0x7f4b2bfda6f8, collection=0x55a8612d7900, parsedDelete=parsedDelete@entry=0x7f4b2bfda400) at src/mongo/db/query/get_executor.cpp:781
      #23 0x000055a855e97238 in mongo::performSingleDeleteOp (op=..., ns=..., txn=0x55a85feb80e0) at src/mongo/db/ops/write_ops_exec.cpp:640
      #24 mongo::performDeletes (txn=txn@entry=0x55a85feb80e0, wholeOp=...) at src/mongo/db/ops/write_ops_exec.cpp:692
      #25 0x000055a855cba81f in mongo::CmdDelete::runImpl (this=<optimized out>, txn=0x55a85feb80e0, dbname=..., cmdObj=..., result=...) at src/mongo/db/commands/write_commands/write_commands.cpp:363
      #26 0x000055a855cb5f9c in mongo::(anonymous namespace)::WriteCommand::run (this=<optimized out>, txn=0x55a85feb80e0, dbname=..., cmdObj=..., options=<optimized out>, errmsg=..., result=...) at src/mongo/db/commands/write_commands/write_commands.cpp:217
      #27 0x000055a855beecb4 in mongo::Command::run (this=this@entry=0x55a857d870a0 <mongo::cmdDelete>, txn=txn@entry=0x55a85feb80e0, request=..., replyBuilder=replyBuilder@entry=0x7f4b2bfdb590) at src/mongo/db/commands/dbcommands.cpp:1396
      #28 0x000055a855befce3 in mongo::execCommandDatabase (txn=0x55a85feb80e0, command=<optimized out>, request=..., replyBuilder=<optimized out>) at src/mongo/db/commands/dbcommands.cpp:1595
      #29 0x000055a855bbdefc in std::function<void (mongo::OperationContext*, mongo::Command*, mongo::rpc::RequestInterface const&, mongo::rpc::ReplyBuilderInterface*)>::operator()(mongo::OperationContext*, mongo::Command*, mongo::rpc::RequestInterface const&, mongo::rpc::ReplyBuilderInterface*) const (__args#3=0x7f4b2bfdb590, __args#2=..., __args#1=0x55a857d870a0 <mongo::cmdDelete>, __args#0=0x55a85feb80e0, this=0x55a857d803e0 <mongo::(anonymous namespace)::execCommandHandler>) at /usr/include/c++/5/functional:2267
      #30 mongo::Command::execCommand (txn=txn@entry=0x55a85feb80e0, command=command@entry=0x55a857d870a0 <mongo::cmdDelete>, request=..., replyBuilder=replyBuilder@entry=0x7f4b2bfdb590) at src/mongo/db/commands.cpp:401
      #31 0x000055a85621f70d in mongo::runCommands (txn=txn@entry=0x55a85feb80e0, request=..., replyBuilder=replyBuilder@entry=0x7f4b2bfdb590) at src/mongo/db/run_commands.cpp:73
      #32 0x000055a855ae4c5b in mongo::(anonymous namespace)::receivedRpc (client=..., message=..., dbResponse=..., txn=0x55a85feb80e0) at src/mongo/db/assemble_response.cpp:217
      #33 mongo::assembleResponse (txn=0x55a85feb80e0, m=..., dbresponse=..., remote=...) at src/mongo/db/assemble_response.cpp:554
      #34 0x000055a8559f074d in mongo::ServiceEntryPointMongod::_sessionLoop (this=this@entry=0x55a85939f300, session=std::shared_ptr (count 2, weak 2) 0x55a85d376140) at src/mongo/db/service_entry_point_mongod.cpp:135
      #35 0x000055a8559f108d in mongo::ServiceEntryPointMongod::<lambda(const SessionHandle&)>::operator() (session=..., __closure=<optimized out>) at src/mongo/db/service_entry_point_mongod.cpp:103
      #36 std::_Function_handler<void(const std::shared_ptr<mongo::transport::Session>&), mongo::ServiceEntryPointMongod::startSession(mongo::transport::SessionHandle)::<lambda(const SessionHandle&)> >::_M_invoke(const std::_Any_data &, const std::shared_ptr<mongo::transport::Session> &) (__functor=..., __args#0=...) at /usr/include/c++/5/functional:1871
      #37 0x000055a856681382 in std::function<void (std::shared_ptr<mongo::transport::Session> const&)>::operator()(std::shared_ptr<mongo::transport::Session> const&) const (__args#0=std::shared_ptr (count 2, weak 2) 0x55a85d376140, this=0x55a8603c6870) at /usr/include/c++/5/functional:2267
      #38 mongo::(anonymous namespace)::runFunc (ptr=0x55a8603c6860) at src/mongo/transport/service_entry_point_utils.cpp:78
      #39 0x00007f4b66f0c6ba in start_thread (arg=0x7f4b2bfdd700) at pthread_create.c:333
      #40 0x00007f4b66c4282d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
      

      And

      Thread 71 (Thread 0x7f4b2b6ff700 (LWP 27099)):
      #0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
      #1  0x000055a855cc7fe8 in __gthread_cond_timedwait (__abs_timeout=0x7f4b2b6fbb90, __mutex=<optimized out>, __cond=0x55a85d54e5d8) at /usr/include/x86_64-linux-gnu/c++/5/bits/gthr-default.h:871
      #2  std::condition_variable::__wait_until_impl<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (__atime=..., __lock=<synthetic pointer>, this=0x55a85d54e5d8) at /usr/include/c++/5/condition_variable:165
      #3  std::condition_variable::wait_until<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (__atime=..., __lock=<synthetic pointer>, this=0x55a85d54e5d8) at /usr/include/c++/5/condition_variable:105
      #4  std::condition_variable::wait_for<long, std::ratio<1l, 1000000000l> > (__rtime=..., __lock=<synthetic pointer>, this=0x55a85d54e5d8) at /usr/include/c++/5/condition_variable:137
      #5  mongo::CondVarLockGrantNotification::wait (this=this@entry=0x55a85d54e5a8, timeoutMs=timeoutMs@entry=500) at src/mongo/db/concurrency/lock_state.cpp:238
      #6  0x000055a855ccc95e in mongo::LockerImpl<true>::lockComplete (this=0x55a85d54e000, resId=..., mode=<optimized out>, timeoutMs=4294967295, checkDeadlock=false) at src/mongo/db/concurrency/lock_state.cpp:749
      #7  0x000055a85643551b in mongo::DurableViewCatalogImpl::iterate(mongo::OperationContext*, std::function<mongo::Status (mongo::BSONObj const&)>) (this=0x55a85f914d28, txn=0x55a85d7bd260, callback=...) at src/mongo/db/views/durable_view_catalog.cpp:75
      #8  0x000055a85643da81 in mongo::ViewCatalog::_reloadIfNeeded_inlock (this=this@entry=0x55a85f914d38, txn=txn@entry=0x55a85d7bd260) at src/mongo/db/views/view_catalog.cpp:98
      #9  0x000055a85643e274 in mongo::ViewCatalog::_lookup_inlock (this=this@entry=0x55a85f914d38, txn=0x55a85d7bd260, ns=...) at src/mongo/db/views/view_catalog.cpp:360
      #10 0x000055a85643e38d in mongo::ViewCatalog::lookup (this=0x55a85f914d38, txn=txn@entry=0x55a85d7bd260, ns=...) at src/mongo/db/views/view_catalog.cpp:377
      #11 0x000055a855cdf24f in mongo::AutoGetCollection::AutoGetCollection (this=0x7f4b2b6fc608, txn=0x55a85d7bd260, nss=..., modeDB=<optimized out>, modeColl=mongo::MODE_IX, viewMode=<optimized out>) at src/mongo/db/db_raii.cpp:59
      #12 0x000055a855e994a8 in mongo::AutoGetCollection::AutoGetCollection (modeColl=<optimized out>, modeDB=mongo::MODE_IX, nss=..., txn=0x55a85d7bd260, this=0x7f4b2b6fc608) at src/mongo/db/db_raii.h:87
      #13 boost::optional_detail::optional_base<mongo::AutoGetCollection>::emplace_assign<mongo::OperationContext*&, mongo::NamespaceString const&, mongo::LockMode, mongo::LockMode>(mongo::OperationContext*&, mongo::NamespaceString const&, mongo::LockMode&&, mongo::LockMode&&) (this=0x7f4b2b6fc600) at src/third_party/boost-1.60.0/boost/optional/optional.hpp:494
      #14 boost::optional<mongo::AutoGetCollection>::emplace<mongo::OperationContext*&, mongo::NamespaceString const&, mongo::LockMode, mongo::LockMode>(mongo::OperationContext*&, mongo::NamespaceString const&, mongo::LockMode&&, mongo::LockMode&&) (this=0x7f4b2b6fc600) at src/third_party/boost-1.60.0/boost/optional/optional.hpp:981
      #15 mongo::performSingleUpdateOp (op=..., ns=..., txn=0x55a85d7bd260) at src/mongo/db/ops/write_ops_exec.cpp:515
      #16 mongo::performUpdates (txn=txn@entry=0x55a85d7bd260, wholeOp=...) at src/mongo/db/ops/write_ops_exec.cpp:585
      #17 0x000055a855cba64f in mongo::CmdUpdate::runImpl (this=<optimized out>, txn=0x55a85d7bd260, dbname=..., cmdObj=..., result=...) at src/mongo/db/commands/write_commands/write_commands.cpp:295
      #18 0x000055a855cb5f9c in mongo::(anonymous namespace)::WriteCommand::run (this=<optimized out>, txn=0x55a85d7bd260, dbname=..., cmdObj=..., options=<optimized out>, errmsg=..., result=...) at src/mongo/db/commands/write_commands/write_commands.cpp:217
      #19 0x000055a855beecb4 in mongo::Command::run (this=this@entry=0x55a857d87180 <mongo::cmdUpdate>, txn=txn@entry=0x55a85d7bd260, request=..., replyBuilder=replyBuilder@entry=0x7f4b2b6fd590) at src/mongo/db/commands/dbcommands.cpp:1396
      #20 0x000055a855befce3 in mongo::execCommandDatabase (txn=0x55a85d7bd260, command=<optimized out>, request=..., replyBuilder=<optimized out>) at src/mongo/db/commands/dbcommands.cpp:1595
      #21 0x000055a855bbdefc in std::function<void (mongo::OperationContext*, mongo::Command*, mongo::rpc::RequestInterface const&, mongo::rpc::ReplyBuilderInterface*)>::operator()(mongo::OperationContext*, mongo::Command*, mongo::rpc::RequestInterface const&, mongo::rpc::ReplyBuilderInterface*) const (__args#3=0x7f4b2b6fd590, __args#2=..., __args#1=0x55a857d87180 <mongo::cmdUpdate>, __args#0=0x55a85d7bd260, this=0x55a857d803e0 <mongo::(anonymous namespace)::execCommandHandler>) at /usr/include/c++/5/functional:2267
      #22 mongo::Command::execCommand (txn=txn@entry=0x55a85d7bd260, command=command@entry=0x55a857d87180 <mongo::cmdUpdate>, request=..., replyBuilder=replyBuilder@entry=0x7f4b2b6fd590) at src/mongo/db/commands.cpp:401
      #23 0x000055a85621f70d in mongo::runCommands (txn=txn@entry=0x55a85d7bd260, request=..., replyBuilder=replyBuilder@entry=0x7f4b2b6fd590) at src/mongo/db/run_commands.cpp:73
      #24 0x000055a855ae4c5b in mongo::(anonymous namespace)::receivedRpc (client=..., message=..., dbResponse=..., txn=0x55a85d7bd260) at src/mongo/db/assemble_response.cpp:217
      #25 mongo::assembleResponse (txn=0x55a85d7bd260, m=..., dbresponse=..., remote=...) at src/mongo/db/assemble_response.cpp:554
      #26 0x000055a8559f074d in mongo::ServiceEntryPointMongod::_sessionLoop (this=this@entry=0x55a85939f300, session=std::shared_ptr (count 2, weak 2) 0x55a85d3756c0) at src/mongo/db/service_entry_point_mongod.cpp:135
      #27 0x000055a8559f108d in mongo::ServiceEntryPointMongod::<lambda(const SessionHandle&)>::operator() (session=..., __closure=<optimized out>) at src/mongo/db/service_entry_point_mongod.cpp:103
      #28 std::_Function_handler<void(const std::shared_ptr<mongo::transport::Session>&), mongo::ServiceEntryPointMongod::startSession(mongo::transport::SessionHandle)::<lambda(const SessionHandle&)> >::_M_invoke(const std::_Any_data &, const std::shared_ptr<mongo::transport::Session> &) (__functor=..., __args#0=...) at /usr/include/c++/5/functional:1871
      #29 0x000055a856681382 in std::function<void (std::shared_ptr<mongo::transport::Session> const&)>::operator()(std::shared_ptr<mongo::transport::Session> const&) const (__args#0=std::shared_ptr (count 2, weak 2) 0x55a85d3756c0, this=0x55a8603c5610) at /usr/include/c++/5/functional:2267
      #30 mongo::(anonymous namespace)::runFunc (ptr=0x55a8603c5600) at src/mongo/transport/service_entry_point_utils.cpp:78
      #31 0x00007f4b66f0c6ba in start_thread (arg=0x7f4b2b6ff700) at pthread_create.c:333
      #32 0x00007f4b66c4282d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
      

      max.hirschhorn and I took a look at this for a bit and tried many different configurations to narrow down the scenario. We're not sure why this doesn't repro on a single mongod with 2 threads. Max did bring up the possibility that this could be related to a long opened server ticket SERVER-10747. Based on one of the stacks, we do know that the $where operator is involved in some obscure way. Also, the system.views collection needs to be refreshed from disk in order to trigger this deadlock.

      So far, two ways I know to reproduce this. I've attached the 8ef4-mdb_3aef-ent_8392-qa_c627a-1485290758685-63.js file to be placed in the <mongo>/jstestfuzz/out directory. There is a higher probability to reproduce if you have a larger number of clients per fixture, like 10.

      1. buildscripts/resmoke.py --suite=jstestfuzz_sharded --storageEngine=mmapv1 --numClientsPerFixture=10
      2. buildscripts/resmoke.py --suite=jstestfuzz_replication --storageEngine=mmapv1 --numClientsPerFixture=10
      

      I was able to determine that the following cycle exists.

      LWP 27095 ----> (waiting for mutex) 0x55a85f914d38 ----> (held by) LWP 27099 ----> (waiting for MongoDB lock) 0x55a8612a1380 ----> (held by) LWP 27095
      

            Assignee:
            backlog-server-execution [DO NOT USE] Backlog - Storage Execution Team
            Reporter:
            eddie.louie Eddie Louie
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: