[SERVER-30972] Deadlock in WiredTigerOplogManager on shutdown Created: 06/Sep/17  Updated: 30/Oct/23  Resolved: 12/Sep/17

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: 3.6.0-rc0

Type: Bug Priority: Major - P3
Reporter: Siyuan Zhou Assignee: Eric Milkie
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File deadlock-wtoplog.png     Text File debugger_mongod_28171.log    
Issue Links:
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Storage 2017-10-02
Participants:
Linked BF Score: 0

 Description   

WT engine deletes the oplog manager on shutdown while holding the oplog manager mutex.

void WiredTigerKVEngine::deleteOplogManager() {
    stdx::unique_lock<stdx::mutex> lock(_oplogManagerMutex);
    invariant(_oplogManagerCount > 0);
    _oplogManagerCount--;
    if (_oplogManagerCount == 0)
        _oplogManager.reset();
}

Oplog manager's destructor waits for _oplogJournalThread to join. However the oplog journal thread may be setting the the oldest timestamp, which needs oplog manager's mutex, thus causing a deadlock.

Thread 22: "WTOplogJournalThread" (Thread 0x7f614ef05700 (LWP 28210))
#0  0x00007f616447d334 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f61644785d8 in _L_lock_854 () from /lib64/libpthread.so.0
#2  0x00007f61644784a7 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f6167d285fa in __gthread_mutex_lock (__mutex=0x7f616b274038) at /opt/mongodbtoolchain/v2/include/c++/5.4.0/x86_64-mongodb-linux/bits/gthr-default.h:748
#4  std::mutex::lock (this=0x7f616b274038) at /opt/mongodbtoolchain/v2/include/c++/5.4.0/mutex:135
#5  std::unique_lock<std::mutex>::lock (this=0x7f614ef044c0) at /opt/mongodbtoolchain/v2/include/c++/5.4.0/mutex:485
#6  std::unique_lock<std::mutex>::unique_lock (__m=..., this=0x7f614ef044c0) at /opt/mongodbtoolchain/v2/include/c++/5.4.0/mutex:415
#7  mongo::WiredTigerKVEngine::_setOldestTimestamp (this=0x7f616b274000, oldestTimestamp=...) at src/mongo/db/storage/wiredtiger/wiredtiger_kv_engine.cpp:1026
#8  0x00007f61680cef91 in mongo::repl::ReplicationCoordinatorImpl::_setStableTimestampForStorage_inlock (this=this@entry=0x7f616af03680) at src/mongo/db/repl/replication_coordinator_impl.cpp:3019
#9  0x00007f61680d0b8c in mongo::repl::ReplicationCoordinatorImpl::_updateCommitPoint_inlock (this=this@entry=0x7f616af03680) at src/mongo/db/repl/replication_coordinator_impl.cpp:3044
#10 0x00007f61680d1068 in mongo::repl::ReplicationCoordinatorImpl::_updateLastCommittedOpTime_inlock (this=this@entry=0x7f616af03680) at src/mongo/db/repl/replication_coordinator_impl.cpp:2951
#11 0x00007f61680d1702 in mongo::repl::ReplicationCoordinatorImpl::_setMyLastDurableOpTime_inlock (this=this@entry=0x7f616af03680, opTime=..., isRollbackAllowed=isRollbackAllowed@entry=false) at src/mongo/db/repl/replication_coordinator_impl.cpp:1053
#12 0x00007f61680d182b in mongo::repl::ReplicationCoordinatorImpl::setMyLastDurableOpTimeForward (this=0x7f616af03680, opTime=...) at src/mongo/db/repl/replication_coordinator_impl.cpp:978
#13 0x00007f6167d3f222 in mongo::WiredTigerSessionCache::waitUntilDurable (this=this@entry=0x7f616b277f00, forceCheckpoint=forceCheckpoint@entry=false, stableCheckpoint=stableCheckpoint@entry=false) at src/mongo/db/storage/wiredtiger/wiredtiger_session_cache.cpp:268
#14 0x00007f6167d2c775 in mongo::WiredTigerOplogManager::_oplogJournalThreadLoop (this=0x7f616ea27480, sessionCache=0x7f616b277f00, oplogRecordStore=0x7f616ea30300) at src/mongo/db/storage/wiredtiger/wiredtiger_oplog_manager.cpp:177
#15 0x00007f6169285eb0 in std::execute_native_thread_routine (__p=<optimized out>) at ../../../../../gcc-5.4.0/libstdc++-v3/src/c++11/thread.cc:84
#16 0x00007f6164476aa1 in start_thread () from /lib64/libpthread.so.0
#17 0x00007f61641c3bcd in clone () from /lib64/libc.so.6

Attached is the debugger's log and the lock dependency graph. The join wait is not shown in the graph.



 Comments   
Comment by Ramon Fernandez Marina [ 12/Sep/17 ]

Author:

{'username': u'milkie', 'name': u'Eric Milkie', 'email': u'milkie@10gen.com'}

Message:SERVER-30972 avoid oplog manager deadlock at shutdown
Branch:master
https://github.com/mongodb/mongo/commit/bddfee0d34513631424363645139f8eb4acbbc4c

Generated at Thu Feb 08 04:25:36 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.