Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-21965

Deadlock between drop database and list collection metadata for KVCatalog-based storage engines without document locking

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Storage
    • Labels:
    • Fully Compatible
    • ALL
    • Storage 2017-01-23
    • 0

      Storage engines which use the KVCatalog infrastructure, but do not support document-level locking (such as ephemeralForTest) may experience deadlock between drop database and listing the collection metadata because of reverse order of lock acquisition for the METADATA and oplog COLLECTION locks. This bug does not apply to MMAPV1 (which does not support document-level locking, but also does not use the KVCatalog) or for WiredTiger.

      Here is an example of the inverse order (Collection, 472559462826177455 is the oplog collection):

      TTLMonitor:
      
      HOLD: Global, 1, IS
      HOLD: Global, 2, IS
      HOLD: Database, 1658880705677372363, IS
      HOLD: Collection, 472559462826177455, S
      WAIT ON: Metadata, S
      
      DropDB:
      
      HOLD: Global, 1, IS
      HOLD: Global, 2, IX
      HOLD: Database, 899920359141007773, X
      HOLD: Database, 1658880705677372363, IX
      HOLD: Metadata, X
      WAIT ON: Collection, 472559462826177455, X
      

      Here are more details from the lock manager:

      LockRequest 13 – OpObserver
      LockRequest 11 – TTLMonitor

      Lock @ 0x601800009040: {11529215046068469761: Metadata, 1}
      GRANTED:
         LockRequest 13 @ 0x6058000BC480: Mode = X; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
      PENDING:
         LockRequest 11 @ 0x60580009CC80: Mode = S; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
      -----------------------------------------------------------
      
      Lock @ 0x601800009580: {2305843009213693953: Global, 1}
      GRANTED:
         LockRequest 18 @ 0x60580010B080: Mode = IS; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
         LockRequest 17 @ 0x6058000FB480: Mode = IS; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
         LockRequest 16 @ 0x6058000EB880: Mode = IS; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
         LockRequest 12 @ 0x6058000AC880: Mode = IS; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
         LockRequest 14 @ 0x6058000CC080: Mode = IS; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
         LockRequest 13 @ 0x6058000BC480: Mode = IS; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
         LockRequest 15 @ 0x6058000DBC80: Mode = IS; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
         LockRequest 11 @ 0x60580009CC80: Mode = IS; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
         LockRequest 10 @ 0x60580008D080: Mode = IS; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
         LockRequest 2 @ 0x60580002E880: Mode = IS; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
      PENDING:
      -----------------------------------------------------------
      
      Lock @ 0x6018000094C0: {2305843009213693954: Global, 2}
      GRANTED:
         LockRequest 2 @ 0x60580002E880: Mode = IS; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
         LockRequest 18 @ 0x60580010B080: Mode = IX; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
         LockRequest 17 @ 0x6058000FB480: Mode = IX; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
         LockRequest 16 @ 0x6058000EB880: Mode = IX; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
         LockRequest 12 @ 0x6058000AC880: Mode = IX; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
         LockRequest 14 @ 0x6058000CC080: Mode = IX; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
         LockRequest 13 @ 0x6058000BC480: Mode = IX; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
         LockRequest 15 @ 0x6058000DBC80: Mode = IX; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
         LockRequest 11 @ 0x60580009CC80: Mode = IS; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
         LockRequest 10 @ 0x60580008D080: Mode = IS; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
      PENDING:
      -----------------------------------------------------------
      
      Lock @ 0x6018000E3D40: {7817449386782089629: Database, 899920359141007773}
      GRANTED:
         LockRequest 13 @ 0x6058000BC480: Mode = X; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
      PENDING:
         LockRequest 18 @ 0x60580010B080: Mode = IX; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
         LockRequest 14 @ 0x6058000CC080: Mode = IX; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
         LockRequest 17 @ 0x6058000FB480: Mode = IX; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
         LockRequest 15 @ 0x6058000DBC80: Mode = IX; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
         LockRequest 12 @ 0x6058000AC880: Mode = X; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
         LockRequest 2 @ 0x60580002E880: Mode = IS; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
         LockRequest 16 @ 0x6058000EB880: Mode = IX; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
      -----------------------------------------------------------
      
      Lock @ 0x60180006B8C0: {9695931499680953263: Collection, 472559462826177455}
      GRANTED:
         LockRequest 11 @ 0x60580009CC80: Mode = S; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
      PENDING:
         LockRequest 13 @ 0x6058000BC480: Mode = X; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
         LockRequest 10 @ 0x60580008D080: Mode = S; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
      -----------------------------------------------------------
      
      Lock @ 0x6018000091C0: {8576409733318454219: Database, 1658880705677372363}
      GRANTED:
         LockRequest 13 @ 0x6058000BC480: Mode = IX; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
         LockRequest 11 @ 0x60580009CC80: Mode = IS; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
         LockRequest 10 @ 0x60580008D080: Mode = IS; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
      

      Threads:

      Thread 7 (Thread 0x7f7f5ed9d700 (LWP 25080)):
      #0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
      #1  0x0000000001e30310 in __gthread_cond_timedwait (__cond=0x6058000bca50, __mutex=0x6058000bca28, __abs_timeout=0x7f7f5ed99d20) at /usr/include/x86_64-linux-gnu/c++/4.8/bits/gthr-default.h:871
      #2  0x0000000001e449c5 in std::condition_variable::__wait_until_impl<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (this=0x6058000bca50, __lock=..., __atime=...) at /usr/include/c++/4.8/condition_variable:160
      #3  0x0000000001e3f6d3 in std::condition_variable::wait_until<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (this=0x6058000bca50, __lock=..., __atime=...) at /usr/include/c++/4.8/condition_variable:100
      #4  0x000000000207bbfd in std::condition_variable::wait_for<long, std::ratio<1l, 1000l> > (this=0x6058000bca50, __lock=..., __rtime=...) at /usr/include/c++/4.8/condition_variable:132
      #5  0x000000000207a713 in mongo::CondVarLockGrantNotification::wait (this=0x6058000bca20, timeoutMs=500) at src/mongo/db/concurrency/lock_state.cpp:229
      #6  0x0000000002082850 in mongo::LockerImpl<false>::lockComplete (this=0x6058000bc480, resId=..., mode=mongo::MODE_X, timeoutMs=4294967295, checkDeadlock=false) at src/mongo/db/concurrency/lock_state.cpp:680
      #7  0x00000000020809c9 in mongo::LockerImpl<false>::lock (this=0x6058000bc480, resId=..., mode=mongo::MODE_X, timeoutMs=4294967295, checkDeadlock=false) at src/mongo/db/concurrency/lock_state.cpp:396
      #8  0x0000000002065869 in mongo::Lock::CollectionLock::CollectionLock (this=0x7f7f5ed9a250, lockState=0x6058000bc480, ns=..., mode=mongo::MODE_IX) at src/mongo/db/concurrency/d_concurrency.cpp:141
      #9  0x00000000026088cc in mongo::repl::_logOp (txn=0x6016001725d0, opstr=0x437f480 "c", ns=0x60080077cc68 "test.$cmd", obj=..., o2=0x0, fromMigrate=false, oplogName=..., replMode=mongo::repl::ReplicationCoordinator::modeReplSet, updateOpTime=true) at src/mongo/db/repl/oplog.cpp:416
      #10 0x00000000026091bb in mongo::repl::logOp (txn=0x6016001725d0, opstr=0x437f480 "c", ns=0x60080077cc68 "test.$cmd", obj=..., o2=0x0, fromMigrate=false) at src/mongo/db/repl/oplog.cpp:459
      #11 0x0000000002314060 in mongo::OpObserver::onDropCollection (this=0x6004000056d0, txn=0x6016001725d0, collectionName=...) at src/mongo/db/op_observer.cpp:160
      #12 0x0000000001f40e34 in mongo::Database::dropCollection (this=0x600c00185a20, txn=0x6016001725d0, fullns=...) at src/mongo/db/catalog/database.cpp:409
      #13 0x0000000001f4d68e in mongo::dropCollection (txn=0x6016001725d0, collectionName=..., result=...) at src/mongo/db/catalog/drop_collection.cpp:90
      #14 0x000000000209d677 in mongo::CmdDrop::run (this=0x598b8c0 <mongo::cmdDrop>, txn=0x6016001725d0, dbname=..., cmdObj=..., errmsg=..., result=...) at src/mongo/db/dbcommands.cpp:479
      #15 0x0000000002099c69 in mongo::Command::run (this=0x598b8c0 <mongo::cmdDrop>, txn=0x6016001725d0, request=..., replyBuilder=0x7f7f5ed9b8e0) at src/mongo/db/dbcommands.cpp:1432
      #16 0x000000000209867d in mongo::Command::execCommand (txn=0x6016001725d0, command=0x598b8c0 <mongo::cmdDrop>, request=..., replyBuilder=0x7f7f5ed9b8e0) at src/mongo/db/dbcommands.cpp:1317
      #17 0x0000000001f8969a in mongo::runCommands (txn=0x6016001725d0, request=..., replyBuilder=0x7f7f5ed9b8e0) at src/mongo/db/commands.cpp:498
      #18 0x000000000229f80d in mongo::(anonymous namespace)::receivedRpc (txn=0x6016001725d0, client=..., dbResponse=..., message=...) at src/mongo/db/instance.cpp:304
      #19 0x00000000022a107a in mongo::assembleResponse (txn=0x6016001725d0, m=..., dbresponse=..., remote=...) at src/mongo/db/instance.cpp:533
      #20 0x0000000001d16b7d in mongo::MyMessageHandler::process (this=0x600400005670, m=..., port=0x600e000198c0) at src/mongo/db/db.cpp:171
      #21 0x0000000002ccd6b3 in mongo::PortMessageServer::handleIncomingMsg (arg=0x600e000198c0) at src/mongo/util/net/message_server_port.cpp:229
      #22 0x00007f7f69d7fb98 in ?? () from /usr/lib/x86_64-linux-gnu/libasan.so.0
      #23 0x00007f7f68f25182 in start_thread (arg=0x7f7f5ed9d700) at pthread_create.c:312
      #24 0x00007f7f68c5247d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      
      Thread 57 (Thread 0x7f7f655d6700 (LWP 25017)):
      #0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
      #1  0x0000000001e30310 in __gthread_cond_timedwait (__cond=0x60580009d250, __mutex=0x60580009d228, __abs_timeout=0x7f7f655d4760) at /usr/include/x86_64-linux-gnu/c++/4.8/bits/gthr-default.h:871
      #2  0x0000000001e449c5 in std::condition_variable::__wait_until_impl<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (this=0x60580009d250, __lock=..., __atime=...) at /usr/include/c++/4.8/condition_variable:160
      #3  0x0000000001e3f6d3 in std::condition_variable::wait_until<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (this=0x60580009d250, __lock=..., __atime=...) at /usr/include/c++/4.8/condition_variable:100
      #4  0x000000000207bbfd in std::condition_variable::wait_for<long, std::ratio<1l, 1000l> > (this=0x60580009d250, __lock=..., __rtime=...) at /usr/include/c++/4.8/condition_variable:132
      #5  0x000000000207a713 in mongo::CondVarLockGrantNotification::wait (this=0x60580009d220, timeoutMs=500) at src/mongo/db/concurrency/lock_state.cpp:229
      #6  0x0000000002082850 in mongo::LockerImpl<false>::lockComplete (this=0x60580009cc80, resId=..., mode=mongo::MODE_S, timeoutMs=4294967295, checkDeadlock=false) at src/mongo/db/concurrency/lock_state.cpp:680
      #7  0x00000000020809c9 in mongo::LockerImpl<false>::lock (this=0x60580009cc80, resId=..., mode=mongo::MODE_S, timeoutMs=4294967295, checkDeadlock=false) at src/mongo/db/concurrency/lock_state.cpp:396
      #8  0x0000000002065e2a in mongo::Lock::ResourceLock::lock (this=0x6006000d6840, mode=mongo::MODE_S) at src/mongo/db/concurrency/d_concurrency.cpp:186
      #9  0x0000000001f18dae in mongo::Lock::ResourceLock::ResourceLock (this=0x6006000d6840, locker=0x60580009cc80, rid=..., mode=mongo::MODE_S) at src/mongo/db/concurrency/d_concurrency.h:82
      #10 0x0000000002806378 in mongo::KVCatalog::_findEntry (this=0x601800009940, opCtx=0x7f7f655d55d0, ns=..., out=0x0) at src/mongo/db/storage/kv/kv_catalog.cpp:214
      #11 0x00000000028068e6 in mongo::KVCatalog::getMetaData (this=0x601800009940, opCtx=0x7f7f655d55d0, ns=...) at src/mongo/db/storage/kv/kv_catalog.cpp:242
      #12 0x0000000002810066 in mongo::KVCollectionCatalogEntry::_getMetaData (this=0x600c00089360, txn=0x7f7f655d55d0) at src/mongo/db/storage/kv/kv_collection_catalog_entry.cpp:179
      #13 0x00000000027d9fdf in mongo::BSONCollectionCatalogEntry::getAllIndexes (this=0x600c00089360, txn=0x7f7f655d55d0, names=0x7f7f655d52e0) at src/mongo/db/storage/bson_collection_catalog_entry.cpp:72
      #14 0x000000000295e90d in mongo::TTLMonitor::getTTLIndexesForDB (this=0x6006000c35e0, txn=0x7f7f655d55d0, dbName=..., indexes=0x7f7f655d5530) at src/mongo/db/ttl.cpp:190
      #15 0x000000000295e09d in mongo::TTLMonitor::doTTLPass (this=0x6006000c35e0) at src/mongo/db/ttl.cpp:140
      #16 0x000000000295dc9f in mongo::TTLMonitor::run (this=0x6006000c35e0) at src/mongo/db/ttl.cpp:113
      #17 0x0000000002c9a938 in mongo::BackgroundJob::jobBody (this=0x6006000c35e0) at src/mongo/util/background.cpp:152
      #18 0x0000000002c9f5ff in std::_Mem_fn<void (mongo::BackgroundJob::*)()>::operator()<, void>(mongo::BackgroundJob*) const (this=0x600e0001a0d0, __object=0x6006000c35e0) at /usr/include/c++/4.8/functional:601
      #19 0x0000000002c9f498 in std::_Bind<std::_Mem_fn<void (mongo::BackgroundJob::*)()> (mongo::BackgroundJob*)>::__call<void, , 0ul>(std::tuple<>&&, std::_Index_tuple<0ul>) (this=0x600e0001a0d0, __args=<unknown type in /home/s/code/mongo/mongo/mongod, CU 0xb1ab630, DIE 0xb1c87c2>) at /usr/include/c++/4.8/functional:1296
      #20 0x0000000002c9f3f4 in std::_Bind<std::_Mem_fn<void (mongo::BackgroundJob::*)()> (mongo::BackgroundJob*)>::operator()<, void>() (this=0x600e0001a0d0) at /usr/include/c++/4.8/functional:1355
      #21 0x0000000002c9f346 in std::_Bind_simple<std::_Bind<std::_Mem_fn<void (mongo::BackgroundJob::*)()> (mongo::BackgroundJob*)> ()>::_M_invoke<>(std::_Index_tuple<>) (this=0x600e0001a0d0) at /usr/include/c++/4.8/functional:1732
      #22 0x0000000002c9f29d in std::_Bind_simple<std::_Bind<std::_Mem_fn<void (mongo::BackgroundJob::*)()> (mongo::BackgroundJob*)> ()>::operator()() (this=0x600e0001a0d0) at /usr/include/c++/4.8/functional:1720
      #23 0x0000000002c9f236 in std::thread::_Impl<std::_Bind_simple<std::_Bind<std::_Mem_fn<void (mongo::BackgroundJob::*)()> (mongo::BackgroundJob*)> ()> >::_M_run() (this=0x600e0001a0b8) at /usr/include/c++/4.8/thread:115
      #24 0x00007f7f69708a40 in std::(anonymous namespace)::execute_native_thread_routine (__p=<optimized out>) at ../../../../../src/libstdc++-v3/src/c++11/thread.cc:84
      #25 0x00007f7f69d7fb98 in ?? () from /usr/lib/x86_64-linux-gnu/libasan.so.0
      #26 0x00007f7f68f25182 in start_thread (arg=0x7f7f655d6700) at pthread_create.c:312
      #27 0x00007f7f68c5247d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      

      Version: a3d8fbcadfeae8418ba17e5e90e51b929ff3ff93

            Assignee:
            daniel.gottlieb@mongodb.com Daniel Gottlieb (Inactive)
            Reporter:
            kamran.khan Kamran K.
            Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

              Created:
              Updated:
              Resolved: