[SERVER-16246] Deadlock with the WiredTiger storage engine between ListDatabases and DropCollection Created: 19/Nov/14  Updated: 11/Jul/16  Resolved: 20/Nov/14

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: 2.8.0-rc1

Type: Bug Priority: Blocker - P1
Reporter: Kaloian Manassiev Assignee: Matt Kangas
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

 [2014/11/19 17:34:06.751] Thread 8 (Thread 0x7fd723ba2700 (LWP 20786)):
 [2014/11/19 17:34:06.751] #0  0x000000000138fd3e in __wt_readlock ()
 [2014/11/19 17:34:06.751] #1  0x00000000013abdaa in __wt_session_lock_dhandle ()
 [2014/11/19 17:34:06.751] #2  0x00000000013ac128 in __wt_session_get_btree ()
 [2014/11/19 17:34:06.751] #3  0x000000000135bf7f in __wt_conn_btree_apply ()
 [2014/11/19 17:34:06.752] #4  0x0000000001371474 in __wt_curstat_init ()
 [2014/11/19 17:34:06.752] #5  0x00000000013a7a06 in __wt_curstat_colgroup_init ()
 [2014/11/19 17:34:06.752] #6  0x00000000013712a5 in __wt_curstat_init ()
 [2014/11/19 17:34:06.752] #7  0x0000000001371749 in __wt_curstat_open ()
 [2014/11/19 17:34:06.752] #8  0x00000000013a7be6 in __wt_curstat_table_init ()
 [2014/11/19 17:34:06.752] #9  0x00000000013714c1 in __wt_curstat_init ()
 [2014/11/19 17:34:06.752] #10 0x0000000001371749 in __wt_curstat_open ()
 [2014/11/19 17:34:06.752] #11 0x00000000013aaf85 in __wt_open_cursor ()
 [2014/11/19 17:34:06.752] #12 0x00000000013ab355 in ?? ()
 [2014/11/19 17:34:06.753] #13 0x0000000000da8c2c in mongo::WiredTigerUtil::exportTableToBSON(__wt_session*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObjBuilder*) ()
 [2014/11/19 17:34:06.753] #14 0x0000000000d9e165 in mongo::WiredTigerRecordStore::appendCustomStats(mongo::OperationContext*, mongo::BSONObjBuilder*, double) const ()
 [2014/11/19 17:34:06.753] #15 0x0000000000d9d181 in mongo::WiredTigerRecordStore::storageSize(mongo::OperationContext*, mongo::BSONObjBuilder*, int) const ()
 [2014/11/19 17:34:06.753] #16 0x0000000000d2810e in mongo::KVDatabaseCatalogEntry::sizeOnDisk(mongo::OperationContext*) const ()
 [2014/11/19 17:34:06.753] #17 0x000000000096ebe0 in mongo::CmdListDatabases::run(mongo::OperationContext*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj&, int, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, mongo::BSONObjBuilder&, bool) ()
 [2014/11/19 17:34:06.753] #18 0x00000000009d34c4 in mongo::_execCommand(mongo::OperationContext*, mongo::Command*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj&, int, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, mongo::BSONObjBuilder&, bool) ()
 [2014/11/19 17:34:06.753] #19 0x00000000009d4362 in mongo::Command::execCommand(mongo::OperationContext*, mongo::Command*, int, char const*, mongo::BSONObj&, mongo::BSONObjBuilder&, bool) ()
 [2014/11/19 17:34:06.753] #20 0x00000000009d4ed0 in mongo::_runCommands(mongo::OperationContext*, char const*, mongo::BSONObj&, mongo::_BufBuilder<mongo::TrivialAllocator>&, mongo::BSONObjBuilder&, bool, int) ()
 [2014/11/19 17:34:06.754] #21 0x0000000000bd2ecc in mongo::newRunQuery(mongo::OperationContext*, mongo::Message&, mongo::QueryMessage&, mongo::CurOp&, mongo::Message&, bool) ()
 [2014/11/19 17:34:06.754] #22 0x0000000000ab9533 in mongo::assembleResponse(mongo::OperationContext*, mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&, bool) ()
 [2014/11/19 17:34:06.754] #23 0x00000000007fd820 in mongo::MyMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*, mongo::LastError*) ()
 [2014/11/19 17:34:06.754] #24 0x0000000000f3b4a1 in mongo::PortMessageServer::handleIncomingMsg(void*) ()

and

[2014/11/19 17:34:06.757] Thread 5 (Thread 0x7fd723ea5700 (LWP 20791)):
 [2014/11/19 17:34:06.757] #0  0x0000003c6d40e264 in __lll_lock_wait () from /lib64/libpthread.so.0
 [2014/11/19 17:34:06.757] #1  0x0000003c6d409508 in _L_lock_854 () from /lib64/libpthread.so.0
 [2014/11/19 17:34:06.757] #2  0x0000003c6d4093d7 in pthread_mutex_lock () from /lib64/libpthread.so.0
 [2014/11/19 17:34:06.757] #3  0x000000000132440d in __wt_evict_file_exclusive_on ()
 [2014/11/19 17:34:06.758] #4  0x0000000001332138 in __wt_cache_op ()
 [2014/11/19 17:34:06.758] #5  0x00000000013b5dfb in __wt_checkpoint_close ()
 [2014/11/19 17:34:06.758] #6  0x000000000135b97b in __wt_conn_btree_sync_and_close ()
 [2014/11/19 17:34:06.758] #7  0x000000000135c0fa in __wt_conn_dhandle_close_all ()
 [2014/11/19 17:34:06.758] #8  0x000000000139bd58 in __wt_schema_drop ()
 [2014/11/19 17:34:06.758] #9  0x000000000139bc3f in __wt_schema_drop ()
 [2014/11/19 17:34:06.758] #10 0x00000000013a9503 in ?? ()
 [2014/11/19 17:34:06.758] #11 0x0000000000d97c04 in mongo::WiredTigerKVEngine::_drop(mongo::StringData const&) ()
 [2014/11/19 17:34:06.758] #12 0x0000000000d97dff in mongo::WiredTigerKVEngine::dropIdent(mongo::OperationContext*, mongo::StringData const&) ()
 [2014/11/19 17:34:06.759] #13 0x0000000000d27b1a in mongo::KVCollectionCatalogEntry::RemoveIndexChange::commit() ()
 [2014/11/19 17:34:06.759] #14 0x0000000000d9fdb5 in mongo::WiredTigerRecoveryUnit::_commit() ()
 [2014/11/19 17:34:06.759] #15 0x00000000009117cc in mongo::WriteUnitOfWork::commit() ()
 [2014/11/19 17:34:06.759] #16 0x00000000009dc2ca in mongo::CmdDrop::run(mongo::OperationContext*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj&, int, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, mongo::BSONObjBuilder&, bool) ()
 [2014/11/19 17:34:06.759] #17 0x00000000009d34c4 in mongo::_execCommand(mongo::OperationContext*, mongo::Command*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj&, int, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, mongo::BSONObjBuilder&, bool) ()
 [2014/11/19 17:34:06.759] #18 0x00000000009d4362 in mongo::Command::execCommand(mongo::OperationContext*, mongo::Command*, int, char const*, mongo::BSONObj&, mongo::BSONObjBuilder&, bool) ()
 [2014/11/19 17:34:06.759] #19 0x00000000009d4ed0 in mongo::_runCommands(mongo::OperationContext*, char const*, mongo::BSONObj&, mongo::_BufBuilder<mongo::TrivialAllocator>&, mongo::BSONObjBuilder&, bool, int) ()
 [2014/11/19 17:34:06.759] #20 0x0000000000bd2ecc in mongo::newRunQuery(mongo::OperationContext*, mongo::Message&, mongo::QueryMessage&, mongo::CurOp&, mongo::Message&, bool) ()
 [2014/11/19 17:34:06.759] #21 0x0000000000ab9533 in mongo::assembleResponse(mongo::OperationContext*, mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&, bool) ()
 [2014/11/19 17:34:06.760] #22 0x00000000007fd820 in mongo::MyMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*, mongo::LastError*) ()
 [2014/11/19 17:34:06.760] #23 0x0000000000f3b4a1 in mongo::PortMessageServer::handleIncomingMsg(void*) ()



 Comments   
Comment by Eric Milkie [ 20/Nov/14 ]

Presumably fixed by:
commit def8f54bf6162317cc8b345e81c6e698d618ad96
Author: Matt Kangas <matt.kangas@mongodb.com>
Date: Thu Nov 20 08:31:42 2014 -0500

Import wiredtiger-wiredtiger-2.4.0-399-g225aa81.tar.gz from wiredtiger branch mongodb-2.8

Comment by Eric Milkie [ 20/Nov/14 ]

Here is a third example of the hang, with callstacks similar to the ones in the original Description:
https://mci.10gen.com/task_log_raw/mongodb_mongo_master_linux_64_442d1dc06fd8d04e27a2838995f1eef8bf87d27a_14_11_20_00_22_08_parallel_compatibility_WT_linux_64/0?type=T

Comment by Eric Milkie [ 20/Nov/14 ]

The above test failure can be found at: https://mci.10gen.com/task_log_raw/mongodb_mongo_master_linux_64_442d1dc06fd8d04e27a2838995f1eef8bf87d27a_14_11_20_00_22_08_parallel_WT_linux_64/0?type=T

Comment by Eric Milkie [ 20/Nov/14 ]

The same jstest in another run managed to hang in a different place, where just one thread was stuck; when the hang analyzer ran, it found this callstack:

[2014/11/20 00:37:58.639] Thread 16 (Thread 0x48af2940 (LWP 15776)):
 [2014/11/20 00:37:58.639] #0  0x0000000001346659 in __wt_conn_dhandle_find ()
 [2014/11/20 00:37:58.639] #1  0x00000000013971e2 in __wt_session_get_btree ()
 [2014/11/20 00:37:58.639] #2  0x0000000001346fdf in __wt_conn_btree_apply ()
 [2014/11/20 00:37:58.639] #3  0x000000000135c4d4 in __wt_curstat_init ()
 [2014/11/20 00:37:58.639] #4  0x0000000001392a86 in __wt_curstat_colgroup_init ()
 [2014/11/20 00:37:58.639] #5  0x000000000135c305 in __wt_curstat_init ()
 [2014/11/20 00:37:58.639] #6  0x000000000135c7a9 in __wt_curstat_open ()
 [2014/11/20 00:37:58.639] #7  0x0000000001392c66 in __wt_curstat_table_init ()
 [2014/11/20 00:37:58.639] #8  0x000000000135c521 in __wt_curstat_init ()
 [2014/11/20 00:37:58.639] #9  0x000000000135c7a9 in __wt_curstat_open ()
 [2014/11/20 00:37:58.639] #10 0x0000000001396005 in __wt_open_cursor ()
 [2014/11/20 00:37:58.639] #11 0x00000000013963d5 in ?? ()
 [2014/11/20 00:37:58.639] #12 0x0000000000d9c10c in mongo::WiredTigerUtil::exportTableToBSON(__wt_session*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObjBuilder*) ()
 [2014/11/20 00:37:58.639] #13 0x0000000000d92169 in mongo::WiredTigerRecordStore::validate(mongo::OperationContext*, bool, bool, mongo::ValidateAdaptor*, mongo::ValidateResults*, mongo::BSONObjBuilder*) const ()
 [2014/11/20 00:37:58.639] #14 0x00000000008fe10d in mongo::Collection::validate(mongo::OperationContext*, bool, bool, mongo::ValidateResults*, mongo::BSONObjBuilder*) ()
 [2014/11/20 00:37:58.639] #15 0x000000000099e7d2 in mongo::ValidateCmd::run(mongo::OperationContext*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj&, int, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, mongo::BSONObjBuilder&, bool) ()
 [2014/11/20 00:37:58.639] #16 0x00000000009c1564 in mongo::_execCommand(mongo::OperationContext*, mongo::Command*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj&, int, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, mongo::BSONObjBuilder&, bool) ()
 [2014/11/20 00:37:58.639] #17 0x00000000009c2402 in mongo::Command::execCommand(mongo::OperationContext*, mongo::Command*, int, char const*, mongo::BSONObj&, mongo::BSONObjBuilder&, bool) ()
 [2014/11/20 00:37:58.639] #18 0x00000000009c2f70 in mongo::_runCommands(mongo::OperationContext*, char const*, mongo::BSONObj&, mongo::_BufBuilder<mongo::TrivialAllocator>&, mongo::BSONObjBuilder&, bool, int) ()
 [2014/11/20 00:37:58.639] #19 0x0000000000bc3e6c in mongo::newRunQuery(mongo::OperationContext*, mongo::Message&, mongo::QueryMessage&, mongo::CurOp&, mongo::Message&, bool) ()
 [2014/11/20 00:37:58.639] #20 0x0000000000aa82e3 in mongo::assembleResponse(mongo::OperationContext*, mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&, bool) ()
 [2014/11/20 00:37:58.639] #21 0x00000000007ea090 in mongo::MyMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*, mongo::LastError*) ()
 [2014/11/20 00:37:58.639] #22 0x0000000000f30bb1 in mongo::PortMessageServer::handleIncomingMsg(void*)
 [2014/11/20 00:37:58.639]     ()
 [2014/11/20 00:37:58.639] #23 0x00007f61e8bdf73d in start_thread () from /lib64/libpthread.so.0
 [2014/11/20 00:37:58.639] #24 0x00007f61e7db7d1d in clone () from /lib64/libc.so.6

Generated at Thu Feb 08 03:40:26 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.