[SERVER-21965] Deadlock between drop database and list collection metadata for KVCatalog-based storage engines without document locking Created: 20/Dec/15  Updated: 27/Jan/17  Resolved: 15/Dec/16

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Kamran K. Assignee: Daniel Gottlieb (Inactive)
Resolution: Done Votes: 0
Labels: fuzzer-blocker
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
related to SERVER-27831 Deadlock when listing collections on ... Closed
related to SERVER-23989 Disable the concurrency and fuzzer su... Closed
related to SERVER-23994 Re-enable the concurrency and fuzzer ... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Storage 2017-01-23
Participants:
Linked BF Score: 0

 Description   

Storage engines which use the KVCatalog infrastructure, but do not support document-level locking (such as ephemeralForTest) may experience deadlock between drop database and listing the collection metadata because of reverse order of lock acquisition for the METADATA and oplog COLLECTION locks. This bug does not apply to MMAPV1 (which does not support document-level locking, but also does not use the KVCatalog) or for WiredTiger.

Here is an example of the inverse order (Collection, 472559462826177455 is the oplog collection):

TTLMonitor:
 
HOLD: Global, 1, IS
HOLD: Global, 2, IS
HOLD: Database, 1658880705677372363, IS
HOLD: Collection, 472559462826177455, S
WAIT ON: Metadata, S
 
DropDB:
 
HOLD: Global, 1, IS
HOLD: Global, 2, IX
HOLD: Database, 899920359141007773, X
HOLD: Database, 1658880705677372363, IX
HOLD: Metadata, X
WAIT ON: Collection, 472559462826177455, X

Here are more details from the lock manager:

LockRequest 13 – OpObserver
LockRequest 11 – TTLMonitor

Lock @ 0x601800009040: {11529215046068469761: Metadata, 1}
GRANTED:
   LockRequest 13 @ 0x6058000BC480: Mode = X; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
PENDING:
   LockRequest 11 @ 0x60580009CC80: Mode = S; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
-----------------------------------------------------------
 
Lock @ 0x601800009580: {2305843009213693953: Global, 1}
GRANTED:
   LockRequest 18 @ 0x60580010B080: Mode = IS; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
   LockRequest 17 @ 0x6058000FB480: Mode = IS; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
   LockRequest 16 @ 0x6058000EB880: Mode = IS; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
   LockRequest 12 @ 0x6058000AC880: Mode = IS; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
   LockRequest 14 @ 0x6058000CC080: Mode = IS; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
   LockRequest 13 @ 0x6058000BC480: Mode = IS; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
   LockRequest 15 @ 0x6058000DBC80: Mode = IS; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
   LockRequest 11 @ 0x60580009CC80: Mode = IS; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
   LockRequest 10 @ 0x60580008D080: Mode = IS; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
   LockRequest 2 @ 0x60580002E880: Mode = IS; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
PENDING:
-----------------------------------------------------------
 
Lock @ 0x6018000094C0: {2305843009213693954: Global, 2}
GRANTED:
   LockRequest 2 @ 0x60580002E880: Mode = IS; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
   LockRequest 18 @ 0x60580010B080: Mode = IX; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
   LockRequest 17 @ 0x6058000FB480: Mode = IX; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
   LockRequest 16 @ 0x6058000EB880: Mode = IX; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
   LockRequest 12 @ 0x6058000AC880: Mode = IX; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
   LockRequest 14 @ 0x6058000CC080: Mode = IX; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
   LockRequest 13 @ 0x6058000BC480: Mode = IX; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
   LockRequest 15 @ 0x6058000DBC80: Mode = IX; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
   LockRequest 11 @ 0x60580009CC80: Mode = IS; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
   LockRequest 10 @ 0x60580008D080: Mode = IS; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
PENDING:
-----------------------------------------------------------
 
Lock @ 0x6018000E3D40: {7817449386782089629: Database, 899920359141007773}
GRANTED:
   LockRequest 13 @ 0x6058000BC480: Mode = X; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
PENDING:
   LockRequest 18 @ 0x60580010B080: Mode = IX; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
   LockRequest 14 @ 0x6058000CC080: Mode = IX; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
   LockRequest 17 @ 0x6058000FB480: Mode = IX; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
   LockRequest 15 @ 0x6058000DBC80: Mode = IX; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
   LockRequest 12 @ 0x6058000AC880: Mode = X; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
   LockRequest 2 @ 0x60580002E880: Mode = IS; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
   LockRequest 16 @ 0x6058000EB880: Mode = IX; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
-----------------------------------------------------------
 
Lock @ 0x60180006B8C0: {9695931499680953263: Collection, 472559462826177455}
GRANTED:
   LockRequest 11 @ 0x60580009CC80: Mode = S; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
PENDING:
   LockRequest 13 @ 0x6058000BC480: Mode = X; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
   LockRequest 10 @ 0x60580008D080: Mode = S; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
-----------------------------------------------------------
 
Lock @ 0x6018000091C0: {8576409733318454219: Database, 1658880705677372363}
GRANTED:
   LockRequest 13 @ 0x6058000BC480: Mode = IX; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
   LockRequest 11 @ 0x60580009CC80: Mode = IS; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;
   LockRequest 10 @ 0x60580008D080: Mode = IS; ConvertMode = NONE; EnqueueAtFront = 0; CompatibleFirst = 0;

Threads:

Thread 7 (Thread 0x7f7f5ed9d700 (LWP 25080)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x0000000001e30310 in __gthread_cond_timedwait (__cond=0x6058000bca50, __mutex=0x6058000bca28, __abs_timeout=0x7f7f5ed99d20) at /usr/include/x86_64-linux-gnu/c++/4.8/bits/gthr-default.h:871
#2  0x0000000001e449c5 in std::condition_variable::__wait_until_impl<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (this=0x6058000bca50, __lock=..., __atime=...) at /usr/include/c++/4.8/condition_variable:160
#3  0x0000000001e3f6d3 in std::condition_variable::wait_until<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (this=0x6058000bca50, __lock=..., __atime=...) at /usr/include/c++/4.8/condition_variable:100
#4  0x000000000207bbfd in std::condition_variable::wait_for<long, std::ratio<1l, 1000l> > (this=0x6058000bca50, __lock=..., __rtime=...) at /usr/include/c++/4.8/condition_variable:132
#5  0x000000000207a713 in mongo::CondVarLockGrantNotification::wait (this=0x6058000bca20, timeoutMs=500) at src/mongo/db/concurrency/lock_state.cpp:229
#6  0x0000000002082850 in mongo::LockerImpl<false>::lockComplete (this=0x6058000bc480, resId=..., mode=mongo::MODE_X, timeoutMs=4294967295, checkDeadlock=false) at src/mongo/db/concurrency/lock_state.cpp:680
#7  0x00000000020809c9 in mongo::LockerImpl<false>::lock (this=0x6058000bc480, resId=..., mode=mongo::MODE_X, timeoutMs=4294967295, checkDeadlock=false) at src/mongo/db/concurrency/lock_state.cpp:396
#8  0x0000000002065869 in mongo::Lock::CollectionLock::CollectionLock (this=0x7f7f5ed9a250, lockState=0x6058000bc480, ns=..., mode=mongo::MODE_IX) at src/mongo/db/concurrency/d_concurrency.cpp:141
#9  0x00000000026088cc in mongo::repl::_logOp (txn=0x6016001725d0, opstr=0x437f480 "c", ns=0x60080077cc68 "test.$cmd", obj=..., o2=0x0, fromMigrate=false, oplogName=..., replMode=mongo::repl::ReplicationCoordinator::modeReplSet, updateOpTime=true) at src/mongo/db/repl/oplog.cpp:416
#10 0x00000000026091bb in mongo::repl::logOp (txn=0x6016001725d0, opstr=0x437f480 "c", ns=0x60080077cc68 "test.$cmd", obj=..., o2=0x0, fromMigrate=false) at src/mongo/db/repl/oplog.cpp:459
#11 0x0000000002314060 in mongo::OpObserver::onDropCollection (this=0x6004000056d0, txn=0x6016001725d0, collectionName=...) at src/mongo/db/op_observer.cpp:160
#12 0x0000000001f40e34 in mongo::Database::dropCollection (this=0x600c00185a20, txn=0x6016001725d0, fullns=...) at src/mongo/db/catalog/database.cpp:409
#13 0x0000000001f4d68e in mongo::dropCollection (txn=0x6016001725d0, collectionName=..., result=...) at src/mongo/db/catalog/drop_collection.cpp:90
#14 0x000000000209d677 in mongo::CmdDrop::run (this=0x598b8c0 <mongo::cmdDrop>, txn=0x6016001725d0, dbname=..., cmdObj=..., errmsg=..., result=...) at src/mongo/db/dbcommands.cpp:479
#15 0x0000000002099c69 in mongo::Command::run (this=0x598b8c0 <mongo::cmdDrop>, txn=0x6016001725d0, request=..., replyBuilder=0x7f7f5ed9b8e0) at src/mongo/db/dbcommands.cpp:1432
#16 0x000000000209867d in mongo::Command::execCommand (txn=0x6016001725d0, command=0x598b8c0 <mongo::cmdDrop>, request=..., replyBuilder=0x7f7f5ed9b8e0) at src/mongo/db/dbcommands.cpp:1317
#17 0x0000000001f8969a in mongo::runCommands (txn=0x6016001725d0, request=..., replyBuilder=0x7f7f5ed9b8e0) at src/mongo/db/commands.cpp:498
#18 0x000000000229f80d in mongo::(anonymous namespace)::receivedRpc (txn=0x6016001725d0, client=..., dbResponse=..., message=...) at src/mongo/db/instance.cpp:304
#19 0x00000000022a107a in mongo::assembleResponse (txn=0x6016001725d0, m=..., dbresponse=..., remote=...) at src/mongo/db/instance.cpp:533
#20 0x0000000001d16b7d in mongo::MyMessageHandler::process (this=0x600400005670, m=..., port=0x600e000198c0) at src/mongo/db/db.cpp:171
#21 0x0000000002ccd6b3 in mongo::PortMessageServer::handleIncomingMsg (arg=0x600e000198c0) at src/mongo/util/net/message_server_port.cpp:229
#22 0x00007f7f69d7fb98 in ?? () from /usr/lib/x86_64-linux-gnu/libasan.so.0
#23 0x00007f7f68f25182 in start_thread (arg=0x7f7f5ed9d700) at pthread_create.c:312
#24 0x00007f7f68c5247d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
 
Thread 57 (Thread 0x7f7f655d6700 (LWP 25017)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x0000000001e30310 in __gthread_cond_timedwait (__cond=0x60580009d250, __mutex=0x60580009d228, __abs_timeout=0x7f7f655d4760) at /usr/include/x86_64-linux-gnu/c++/4.8/bits/gthr-default.h:871
#2  0x0000000001e449c5 in std::condition_variable::__wait_until_impl<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (this=0x60580009d250, __lock=..., __atime=...) at /usr/include/c++/4.8/condition_variable:160
#3  0x0000000001e3f6d3 in std::condition_variable::wait_until<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (this=0x60580009d250, __lock=..., __atime=...) at /usr/include/c++/4.8/condition_variable:100
#4  0x000000000207bbfd in std::condition_variable::wait_for<long, std::ratio<1l, 1000l> > (this=0x60580009d250, __lock=..., __rtime=...) at /usr/include/c++/4.8/condition_variable:132
#5  0x000000000207a713 in mongo::CondVarLockGrantNotification::wait (this=0x60580009d220, timeoutMs=500) at src/mongo/db/concurrency/lock_state.cpp:229
#6  0x0000000002082850 in mongo::LockerImpl<false>::lockComplete (this=0x60580009cc80, resId=..., mode=mongo::MODE_S, timeoutMs=4294967295, checkDeadlock=false) at src/mongo/db/concurrency/lock_state.cpp:680
#7  0x00000000020809c9 in mongo::LockerImpl<false>::lock (this=0x60580009cc80, resId=..., mode=mongo::MODE_S, timeoutMs=4294967295, checkDeadlock=false) at src/mongo/db/concurrency/lock_state.cpp:396
#8  0x0000000002065e2a in mongo::Lock::ResourceLock::lock (this=0x6006000d6840, mode=mongo::MODE_S) at src/mongo/db/concurrency/d_concurrency.cpp:186
#9  0x0000000001f18dae in mongo::Lock::ResourceLock::ResourceLock (this=0x6006000d6840, locker=0x60580009cc80, rid=..., mode=mongo::MODE_S) at src/mongo/db/concurrency/d_concurrency.h:82
#10 0x0000000002806378 in mongo::KVCatalog::_findEntry (this=0x601800009940, opCtx=0x7f7f655d55d0, ns=..., out=0x0) at src/mongo/db/storage/kv/kv_catalog.cpp:214
#11 0x00000000028068e6 in mongo::KVCatalog::getMetaData (this=0x601800009940, opCtx=0x7f7f655d55d0, ns=...) at src/mongo/db/storage/kv/kv_catalog.cpp:242
#12 0x0000000002810066 in mongo::KVCollectionCatalogEntry::_getMetaData (this=0x600c00089360, txn=0x7f7f655d55d0) at src/mongo/db/storage/kv/kv_collection_catalog_entry.cpp:179
#13 0x00000000027d9fdf in mongo::BSONCollectionCatalogEntry::getAllIndexes (this=0x600c00089360, txn=0x7f7f655d55d0, names=0x7f7f655d52e0) at src/mongo/db/storage/bson_collection_catalog_entry.cpp:72
#14 0x000000000295e90d in mongo::TTLMonitor::getTTLIndexesForDB (this=0x6006000c35e0, txn=0x7f7f655d55d0, dbName=..., indexes=0x7f7f655d5530) at src/mongo/db/ttl.cpp:190
#15 0x000000000295e09d in mongo::TTLMonitor::doTTLPass (this=0x6006000c35e0) at src/mongo/db/ttl.cpp:140
#16 0x000000000295dc9f in mongo::TTLMonitor::run (this=0x6006000c35e0) at src/mongo/db/ttl.cpp:113
#17 0x0000000002c9a938 in mongo::BackgroundJob::jobBody (this=0x6006000c35e0) at src/mongo/util/background.cpp:152
#18 0x0000000002c9f5ff in std::_Mem_fn<void (mongo::BackgroundJob::*)()>::operator()<, void>(mongo::BackgroundJob*) const (this=0x600e0001a0d0, __object=0x6006000c35e0) at /usr/include/c++/4.8/functional:601
#19 0x0000000002c9f498 in std::_Bind<std::_Mem_fn<void (mongo::BackgroundJob::*)()> (mongo::BackgroundJob*)>::__call<void, , 0ul>(std::tuple<>&&, std::_Index_tuple<0ul>) (this=0x600e0001a0d0, __args=<unknown type in /home/s/code/mongo/mongo/mongod, CU 0xb1ab630, DIE 0xb1c87c2>) at /usr/include/c++/4.8/functional:1296
#20 0x0000000002c9f3f4 in std::_Bind<std::_Mem_fn<void (mongo::BackgroundJob::*)()> (mongo::BackgroundJob*)>::operator()<, void>() (this=0x600e0001a0d0) at /usr/include/c++/4.8/functional:1355
#21 0x0000000002c9f346 in std::_Bind_simple<std::_Bind<std::_Mem_fn<void (mongo::BackgroundJob::*)()> (mongo::BackgroundJob*)> ()>::_M_invoke<>(std::_Index_tuple<>) (this=0x600e0001a0d0) at /usr/include/c++/4.8/functional:1732
#22 0x0000000002c9f29d in std::_Bind_simple<std::_Bind<std::_Mem_fn<void (mongo::BackgroundJob::*)()> (mongo::BackgroundJob*)> ()>::operator()() (this=0x600e0001a0d0) at /usr/include/c++/4.8/functional:1720
#23 0x0000000002c9f236 in std::thread::_Impl<std::_Bind_simple<std::_Bind<std::_Mem_fn<void (mongo::BackgroundJob::*)()> (mongo::BackgroundJob*)> ()> >::_M_run() (this=0x600e0001a0b8) at /usr/include/c++/4.8/thread:115
#24 0x00007f7f69708a40 in std::(anonymous namespace)::execute_native_thread_routine (__p=<optimized out>) at ../../../../../src/libstdc++-v3/src/c++11/thread.cc:84
#25 0x00007f7f69d7fb98 in ?? () from /usr/lib/x86_64-linux-gnu/libasan.so.0
#26 0x00007f7f68f25182 in start_thread (arg=0x7f7f655d6700) at pthread_create.c:312
#27 0x00007f7f68c5247d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111


Version: a3d8fbcadfeae8418ba17e5e90e51b929ff3ff93



 Comments   
Comment by Daniel Gottlieb (Inactive) [ 15/Dec/16 ]

The deadlocks reproducible by the script went away with SERVER-24631.

Background:
This deadlock required the TTL monitor asking the storage engine for the indexes on the oplog collection racing with the replication of create/drop collections (and likely also creating/dropping indexes, or anything else that temporarily needs the KVEngine metadata lock). Because the metadata lock is not released until a WriteUnitOfWork commits/aborts, locally legal lock acquisition orders can become illegal in the context of a WriteUnitOfWork.

That patch made the TTL thread smarter about which collections to check and the oplog collection is now omitted (unless an application were to add a TTL index to it....).

It may also be worth noting that this script also caused a deadlock between the TTL monitor looking at the oplog collection racing against the replication of an insert that creates a new collection. I believe that was independently alleviated by SERVER-23128. An outcome of that refactor was that the collection creation (in the KVEngine memory structures) did not happen in the WriteUnitOfWork that the document being inserted was in, thus the Metadata lock was not held when the insertion replicated.

It's likely there are other races over the KVEngine metadata lock which we can investigate separately as they are discovered.

Generated at Thu Feb 08 03:58:59 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.