Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-34833

Deadlock between the logical sessions' transaction reaper and MMAP V1 durability thread

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 3.6.4
    • Fix Version/s: 3.6.6, 4.0.0-rc0
    • Component/s: Sharding
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v3.6
    • Sprint:
      Sharding 2018-05-21
    • Linked BF Score:
      50

      Description

      The logical sessions' transaction reaper for sharded systems performs refresh of the sharding catalog cache while holding collection lock on config.system.sessions.

      This refresh internally may end up having to read from config.cache.chunks.config.system.sessions in order to pick-up the locally cached routing info for the sharded sessions collection. If in the mean time the MMAP V1 durability thread runs, this would lead to deadlock.

      This is evident in the wait states of these three threads:

      Transaction reaper thread:

      Thread 17: "conn1" (Thread 0x7f60c9684700 (LWP 3024))
      #0  0x00007f60c6c5a68c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
      #1  0x00007f60cbaf586c in __gthread_cond_wait (__mutex=<optimized out>, __cond=__cond@entry=0x7f60d45c2c38) at /data/mci/f4077f70c6c50d9bf86e67716e864062/toolchain-builder/build-gcc-v2.sh-IiX/x86_64-mongodb-linux/libstdc++-v3/include/x86_64-mongodb-linu
      x/bits/gthr-default.h:864
      #2  std::condition_variable::wait (this=this@entry=0x7f60d45c2c38, __lock=...) at ../../../../../gcc-5.4.0/libstdc++-v3/src/c++11/condition_variable.cc:53
      #3  0x00007f60cb86e3ab in mongo::OperationContext::<lambda()>::operator() (__closure=<optimized out>) at src/mongo/db/operation_context.cpp:322
      #4  mongo::OperationContext::waitForConditionOrInterruptNoAssertUntil (this=0x7f60d222ea00, cv=..., m=..., deadline=...) at src/mongo/db/operation_context.cpp:326
      #5  0x00007f60cb86e540 in mongo::OperationContext::waitForConditionOrInterruptNoAssert (this=<optimized out>, cv=..., m=...) at src/mongo/db/operation_context.cpp:249
      #6  0x00007f60cb86e60d in mongo::OperationContext::waitForConditionOrInterrupt (this=this@entry=0x7f60d222ea00, cv=..., m=...) at src/mongo/db/operation_context.cpp:244
      #7  0x00007f60caa22a9e in mongo::OperationContext::waitForConditionOrInterrupt<mongo::Notification<mongo::Status>::get(mongo::OperationContext*)::{lambda()#1}>(std::condition_variable&, std::unique_lock<std::mutex>&, mongo::Notification<mongo::Status>::
      get(mongo::OperationContext*)::{lambda()#1}) (pred=..., m=..., cv=..., this=0x7f60d222ea00) at src/mongo/db/operation_context.h:199
      #8  mongo::Notification<mongo::Status>::get (opCtx=0x7f60d222ea00, this=0x7f60d45c2c10) at src/mongo/util/concurrency/notification.h:71
      #9  mongo::CatalogCache::<lambda()>::operator() (__closure=<optimized out>) at src/mongo/s/catalog_cache.cpp:172
      #10 mongo::CatalogCache::getCollectionRoutingInfo (this=this@entry=0x7f60cfdf0680, opCtx=opCtx@entry=0x7f60d222ea00, nss=...) at src/mongo/s/catalog_cache.cpp:176
      #11 0x00007f60caa23790 in mongo::CatalogCache::getCollectionRoutingInfo (this=0x7f60cfdf0680, opCtx=opCtx@entry=0x7f60d222ea00, ns=...) at src/mongo/s/catalog_cache.cpp:192
      #12 0x00007f60ca2ff59f in mongo::(anonymous namespace)::ShardedHandler::handleLsid (lsid=..., this=<optimized out>, this=<optimized out>) at src/mongo/db/transaction_reaper.cpp:207
      #13 mongo::(anonymous namespace)::TransactionReaperImpl<mongo::(anonymous namespace)::ShardedHandler>::reap (this=<optimized out>, opCtx=<optimized out>) at src/mongo/db/transaction_reaper.cpp:138
      #14 0x00007f60cb2a936b in mongo::LogicalSessionCacheImpl::_reap (this=0x7f60ce91c680, client=<optimized out>) at src/mongo/db/logical_session_cache_impl.cpp:192
      #15 0x00007f60cb2a9480 in mongo::LogicalSessionCacheImpl::reapNow (this=<optimized out>, client=<optimized out>) at src/mongo/db/logical_session_cache_impl.cpp:147
      #16 0x00007f60ca5228ae in mongo::(anonymous namespace)::ReapLogicalSessionCacheNowCommand::run (this=this@entry=0x7f60ce917e00, opCtx=opCtx@entry=0x7f60d222ea00, db=..., cmdObj=unowned BSONObj 107 bytes @ 0x7f60d223d29d = {...}, result=...) at src/mongo/db/commands/reap_logical_session_cache_now.cpp:77
      #17 0x00007f60cb383066 in mongo::BasicCommand::enhancedRun (this=0x7f60ce917e00, opCtx=0x7f60d222ea00, request=..., result=...) at src/mongo/db/commands.cpp:398
      #18 0x00007f60cb37da9f in mongo::Command::publicRun (this=0x7f60ce917e00, opCtx=0x7f60d222ea00, request=..., result=...) at src/mongo/db/commands.cpp:336
      #19 0x00007f60ca1fd863 in mongo::(anonymous namespace)::runCommandImpl (startOperationTime=..., replyBuilder=0x7f60ce7a87e0, request=..., command=0x7f60ce917e00, opCtx=0x7f60d222ea00) at src/mongo/db/service_entry_point_mongod.cpp:448
      #20 mongo::(anonymous namespace)::execCommandDatabase (opCtx=<optimized out>, command=command@entry=0x7f60ce917e00, request=..., replyBuilder=<optimized out>, this=<optimized out>, this=<optimized out>) at src/mongo/db/service_entry_point_mongod.cpp:713
      #21 0x00007f60ca1ff926 in mongo::(anonymous namespace)::<lambda()>::operator()(void) const (__closure=__closure@entry=0x7f60c9682470) at src/mongo/db/service_entry_point_mongod.cpp:829
      #22 0x00007f60ca2006a7 in mongo::(anonymous namespace)::runCommands (message=..., opCtx=0x7f60d222ea00) at src/mongo/db/service_entry_point_mongod.cpp:836
      

      Catalog cache loader thread:

      Thread 75: "ConfigServerCatalogCacheLoader-0" (Thread 0x7f60845b1700 (LWP 3486))
      #0  0x00007f60c6c5aa5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
      #1  0x00007f60cb40e9b8 in __gthread_cond_timedwait (__abs_timeout=0x7f60845ac3c0, __mutex=<optimized out>, __cond=0x7f60d59759d8) at /opt/mongodbtoolchain/v2/include/c++/5.4.0/x86_64-mongodb-linux/bits/gthr-default.h:871
      #2  std::condition_variable::__wait_until_impl<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (__atime=..., __lock=<synthetic pointer>..., this=0x7f60d59759d8) at /opt/mongodbtoolchain/v2/include/c++/5.4.0/condition_variable:165
      #3  std::condition_variable::wait_until<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (__atime=..., __lock=<synthetic pointer>..., this=0x7f60d59759d8) at /opt/mongodbtoolchain/v2/include/c++/5.4.0/condition_variable:105
      #4  std::condition_variable::wait_until<std::chrono::_V2::system_clock, std::chrono::duration<long int, std::ratio<1l, 1000000000l> >, mongo::CondVarLockGrantNotification::wait(mongo::Milliseconds)::<lambda()> > (__p=..., __atime=..., __lock=<synthetic
      pointer>..., this=0x7f60d59759d8) at /opt/mongodbtoolchain/v2/include/c++/5.4.0/condition_variable:128
      #5  std::condition_variable::wait_for<long int, std::ratio<1l, 1000000000l>, mongo::CondVarLockGrantNotification::wait(mongo::Milliseconds)::<lambda()> > (__p=..., __rtime=..., __lock=<synthetic pointer>..., this=0x7f60d59759d8) at /opt/mongodbtoolchain
      /v2/include/c++/5.4.0/condition_variable:144
      #6  mongo::CondVarLockGrantNotification::wait (this=this@entry=0x7f60d59759a8, timeout=..., timeout@entry=...) at src/mongo/db/concurrency/lock_state.cpp:226
      #7  0x00007f60cb412e7e in mongo::LockerImpl<true>::lockComplete (this=0x7f60d5975400, resId=..., mode=<optimized out>, timeout=..., checkDeadlock=false) at src/mongo/db/concurrency/lock_state.cpp:747
      #8  0x00007f60cb410b5b in mongo::LockerImpl<true>::lockMMAPV1Flush (this=0x7f60d5975400) at src/mongo/db/concurrency/lock_state.cpp:349
      #9  0x00007f60cb402380 in mongo::Lock::DBLock::DBLock (this=0x7f60845ac810, opCtx=0x7f60d6bf8cc0, db="config", mode=<optimized out>) at src/mongo/db/concurrency/d_concurrency.cpp:201
      #10 0x00007f60ca2527df in mongo::FindCmd::run (this=this@entry=0x7f60cc844540 <mongo::findCmd>, opCtx=opCtx@entry=0x7f60d6bf8cc0, dbname=..., cmdObj=unowned BSONObj 121 bytes @ 0x7f60d5981b1d = {...}, result=...) at src/mongo/db/commands/find_cmd.cpp:26
      2
      #11 0x00007f60cb383066 in mongo::BasicCommand::enhancedRun (this=0x7f60cc844540 <mongo::findCmd>, opCtx=0x7f60d6bf8cc0, request=..., result=...) at src/mongo/db/commands.cpp:398
      #12 0x00007f60cb37da9f in mongo::Command::publicRun (this=0x7f60cc844540 <mongo::findCmd>, opCtx=0x7f60d6bf8cc0, request=..., result=...) at src/mongo/db/commands.cpp:336
      #13 0x00007f60ca1fd863 in mongo::(anonymous namespace)::runCommandImpl (startOperationTime=..., replyBuilder=0x7f60d5962e60, request=..., command=0x7f60cc844540 <mongo::findCmd>, opCtx=0x7f60d6bf8cc0) at src/mongo/db/service_entry_point_mongod.cpp:448
      #14 mongo::(anonymous namespace)::execCommandDatabase (opCtx=<optimized out>, command=command@entry=0x7f60cc844540 <mongo::findCmd>, request=..., replyBuilder=<optimized out>, this=<optimized out>, this=<optimized out>) at src/mongo/db/service_entry_poi
      nt_mongod.cpp:713
      #15 0x00007f60ca1ff926 in mongo::(anonymous namespace)::<lambda()>::operator()(void) const (__closure=__closure@entry=0x7f60845ade30) at src/mongo/db/service_entry_point_mongod.cpp:829
      #16 0x00007f60ca2006a7 in mongo::(anonymous namespace)::runCommands (message=..., opCtx=0x7f60d6bf8cc0) at src/mongo/db/service_entry_point_mongod.cpp:836
      #17 mongo::ServiceEntryPointMongod::handleRequest (this=<optimized out>, opCtx=0x7f60d6bf8cc0, m=...) at src/mongo/db/service_entry_point_mongod.cpp:1090
      #18 0x00007f60cb36d7d4 in mongo::(anonymous namespace)::loopbackBuildResponse (opCtx=0x7f60d6bf8cc0, lastError=0x7f60845af2e8, toSend=...) at src/mongo/db/dbdirectclient.cpp:139
      #19 0x00007f60cb36de54 in mongo::DBDirectClient::call (this=<optimized out>, toSend=..., response=..., assertOk=<optimized out>, actualServer=<optimized out>) at src/mongo/db/dbdirectclient.cpp:144
      #20 0x00007f60cb448676 in mongo::DBClientCursor::init (this=this@entry=0x7f60d5987f00) at src/mongo/client/dbclientcursor.cpp:159
      #21 0x00007f60cb428d26 in mongo::DBClientBase::query (this=0x7f60845af270, ns=..., query=..., nToReturn=1, nToSkip=0, fieldsToReturn=0x0, queryOptions=0, batchSize=0) at src/mongo/client/dbclient.cpp:1090
      #22 0x00007f60cb36df50 in mongo::DBDirectClient::query (this=this@entry=0x7f60845af270, ns=..., query=..., nToReturn=nToReturn@entry=1, nToSkip=nToSkip@entry=0, fieldsToReturn=0x0, queryOptions=0, batchSize=0) at src/mongo/db/dbdirectclient.cpp:164
      #23 0x00007f60ca828c7f in mongo::shardmetadatautil::readShardCollectionsEntry (opCtx=opCtx@entry=0x7f60d6bf8cc0, nss=...) at src/mongo/db/s/shard_metadata_util.cpp:150
      #24 0x00007f60ca8327ef in mongo::(anonymous namespace)::getPersistedMetadataSinceVersion (opCtx=opCtx@entry=0x7f60d6bf8cc0, nss=..., version=..., okToReadWhileRefreshing=<optimized out>, this=<optimized out>, this=<optimized out>) at src/mongo/db/s/shar
      d_server_catalog_cache_loader.cpp:180
      #25 0x00007f60ca83523d in mongo::(anonymous namespace)::getIncompletePersistedMetadataSinceVersion (version=..., nss=..., opCtx=0x7f60d6bf8cc0) at src/mongo/db/s/shard_server_catalog_cache_loader.cpp:210
      #26 mongo::ShardServerCatalogCacheLoader::_getLoaderMetadata (this=<optimized out>, opCtx=opCtx@entry=0x7f60d6bf8cc0, nss=..., catalogCacheSinceVersion=..., term=<optimized out>) at src/mongo/db/s/shard_server_catalog_cache_loader.cpp:538
      #27 0x00007f60ca837a4c in mongo::ShardServerCatalogCacheLoader::<lambda(mongo::OperationContext*, mongo::StatusWith<mongo::CatalogCacheLoader::CollectionAndChangedChunks>)>::operator()(mongo::OperationContext *, mongo::StatusWith<mongo::CatalogCacheLoad
      er::CollectionAndChangedChunks>) const (__closure=0x7f60d5959960, opCtx=0x7f60d6bf8cc0, swCollectionAndChangedChunks=StatusWith(OK, {<boost::optional_detail::optional_base<mongo::CatalogCacheLoader::CollectionAndChangedChunks>> = {<boost::optional_detai
      l::optional_tag> = {<No data fields>}, m_initialized = true, m_storage = {dummy_ = {data = "Y\341\020R<g.\341E\347~ٷ#9\265\b\325\300\326`\177\000\000\000\325\300\326`\177\000\000\214)\305\313`\177", '\000' <repeats 11 times>, "\377Z\204`\177\000\000\300
      И\325`\177\000\000hј\325`\177\000\000hј\325`\177\000", aligner_ = {<No data fields>}}}}, <No data fields>}), this=0x7f60d2249200, this=0x7f60d2249200, this=0x7f60d2249200, this=0x7f60d2249200) at src/mongo/db/s/shard_server_catalog_cache_loader.cpp:506
      #28 0x00007f60ca838064 in std::_Function_handler<void(mongo::OperationContext*, mongo::StatusWith<mongo::CatalogCacheLoader::CollectionAndChangedChunks>), mongo::ShardServerCatalogCacheLoader::_schedulePrimaryGetChunksSince(mongo::OperationContext*, con
      st mongo::NamespaceString&, const mongo::ChunkVersion&, long long int, std::function<void(mongo::OperationContext*, mongo::StatusWith<mongo::CatalogCacheLoader::CollectionAndChangedChunks>)>, std::shared_ptr<mongo::Notification<void> >)::<lambda(mongo::
      OperationContext*, mongo::StatusWith<mongo::CatalogCacheLoader::CollectionAndChangedChunks>)> >::_M_invoke(const std::_Any_data &, <unknown type in /data/mci/3e56be335ba1cc85db897351175a89ed/src/mongod.debug, CU 0x104e001c, DIE 0x105a9508>, <unknown typ
      e in /data/mci/3e56be335ba1cc85db897351175a89ed/src/mongod.debug, CU 0x104e001c, DIE 0x105a950d>) (__functor=..., __args#0=<optimized out>, __args#1=<optimized out>) at /opt/mongodbtoolchain/v2/include/c++/5.4.0/functional:1871
      #29 0x00007f60caa2766e in std::function<void (mongo::OperationContext*, mongo::StatusWith<mongo::CatalogCacheLoader::CollectionAndChangedChunks>)>::operator()(mongo::OperationContext*, mongo::StatusWith<mongo::CatalogCacheLoader::CollectionAndChangedChu
      nks>) const (__args#1=<error reading variable: access outside bounds of object referenced via synthetic pointer>, __args#0=0x7f60d6bf8cc0, this=0x7f60d598a7d8) at /opt/mongodbtoolchain/v2/include/c++/5.4.0/functional:2267
      #30 mongo::ConfigServerCatalogCacheLoader::<lambda()>::operator()(void) const (__closure=0x7f60d598a780) at src/mongo/s/config_server_catalog_cache_loader.cpp:172
      #31 0x00007f60caa31d6c in std::function<void ()>::operator()() const (this=0x7f60845b05b0) at /opt/mongodbtoolchain/v2/include/c++/5.4.0/functional:2267
      #32 mongo::ThreadPool::_doOneTask (this=this@entry=0x7f60d5a65808, lk=lk@entry=0x7f60845b0690) at src/mongo/util/concurrency/thread_pool.cpp:347
      #33 0x00007f60caa3226c in mongo::ThreadPool::_consumeTasks (this=this@entry=0x7f60d5a65808) at src/mongo/util/concurrency/thread_pool.cpp:299
      #34 0x00007f60caa32c56 in mongo::ThreadPool::_workerThreadBody (pool=0x7f60d5a65808, threadName=...) at src/mongo/util/concurrency/thread_pool.cpp:247
      #35 0x00007f60cbaf88e0 in std::execute_native_thread_routine (__p=<optimized out>) at ../../../../../gcc-5.4.0/libstdc++-v3/src/c++11/thread.cc:84
      #36 0x00007f60c6c56aa1 in start_thread () from /lib64/libpthread.so.0
      

      MMAP V1 durability thread:

      Thread 6: "durability" (Thread 0x7f60be40e700 (LWP 3010))
      #0  0x00007f60c6c5aa5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
      #1  0x00007f60cb40e9b8 in __gthread_cond_timedwait (__abs_timeout=0x7f60be40c420, __mutex=<optimized out>, __cond=0x7f60ce7f69d8) at /opt/mongodbtoolchain/v2/include/c++/5.4.0/x86_64-mongodb-linux/bits/gthr-default.h:871
      #2  std::condition_variable::__wait_until_impl<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (__atime=..., __lock=<synthetic pointer>..., this=0x7f60ce7f69d8) at /opt/mongodbtoolchain/v2/include/c++/5.4.0/condition_variable:165
      #3  std::condition_variable::wait_until<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (__atime=..., __lock=<synthetic pointer>..., this=0x7f60ce7f69d8) at /opt/mongodbtoolchain/v2/include/c++/5.4.0/condition_variable:105
      #4  std::condition_variable::wait_until<std::chrono::_V2::system_clock, std::chrono::duration<long int, std::ratio<1l, 1000000000l> >, mongo::CondVarLockGrantNotification::wait(mongo::Milliseconds)::<lambda()> > (__p=..., __atime=..., __lock=<synthetic
      pointer>..., this=0x7f60ce7f69d8) at /opt/mongodbtoolchain/v2/include/c++/5.4.0/condition_variable:128
      #5  std::condition_variable::wait_for<long int, std::ratio<1l, 1000000000l>, mongo::CondVarLockGrantNotification::wait(mongo::Milliseconds)::<lambda()> > (__p=..., __rtime=..., __lock=<synthetic pointer>..., this=0x7f60ce7f69d8) at /opt/mongodbtoolchain
      /v2/include/c++/5.4.0/condition_variable:144
      #6  mongo::CondVarLockGrantNotification::wait (this=this@entry=0x7f60ce7f69a8, timeout=..., timeout@entry=...) at src/mongo/db/concurrency/lock_state.cpp:226
      #7  0x00007f60cb412e7e in mongo::LockerImpl<true>::lockComplete (this=0x7f60ce7f6400, resId=..., mode=<optimized out>, timeout=..., checkDeadlock=false) at src/mongo/db/concurrency/lock_state.cpp:747
      #8  0x00007f60cb40e67e in mongo::AutoAcquireFlushLockForMMAPV1Commit::upgradeFlushLockToExclusive (this=this@entry=0x7f60be40c7d0) at src/mongo/db/concurrency/lock_state.cpp:900
      #9  0x00007f60ca6bb869 in mongo::dur::durThread (cs=0x7f60ce68e3c0, serverStartMs=1507921995999) at src/mongo/db/storage/mmap_v1/dur.cpp:831
      #10 0x00007f60cbaf88e0 in std::execute_native_thread_routine (__p=<optimized out>) at ../../../../../gcc-5.4.0/libstdc++-v3/src/c++11/thread.cc:84
      #11 0x00007f60c6c56aa1 in start_thread () from /lib64/libpthread.so.0
      

        Attachments

          Activity

            People

            Assignee:
            jack.mulrow Jack Mulrow
            Reporter:
            kaloian.manassiev Kaloian Manassiev
            Participants:
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: