|
This issue was originally discovered by the Coverity Static Analysis tool.
Consider the following lock acquisitions in InitialSyncer and ReplicationCoordinatorImpl:
ReplicationCoordinatorImpl::processReplSetSyncFrom
- Acquire ReplicationCoordinatorImpl::_mutex code
- Acquire InitialSyncer::_mutex code
InitialSyncer::_multiApplierCallback
- Acquire InitialSyncer::_mutex code
- Acquire ReplicationCoordinatorImpl::_mutex code
Since these two functions acquire the same two locks but in reverse orders, it creates the potential for a deadlock, if each of these functions are running concurrently. One way to fix this would be to stop InitialSyncer from updating the optime of the ReplicationCoordinator on every batch. Alternatively, the _multiApplierCallback could call the _opts.setLastOpTime outside of holding it's own mutex, since it doesn't seem necessary to synchronize access to the InitialSyncer::_lastApplied after it's been written to in that function.
This issue also occurs in InitialSyncer::_getNextApplierBatchCallback, which acquires the InitialSyncer mutex, and then tries to acquire ReplicationCoordinator's mutex when calling _opts.getSlaveDelay().
Original Coverity Report Message:
Defect 100780 (STATIC_C)
Checker ORDER_REVERSAL (subcategory none)
File: /src/mongo/db/repl/replication_coordinator_impl.cpp
Function mongo::repl::ReplicationCoordinatorImpl::processReplSetSyncFrom(mongo::OperationContext *, const mongo::HostAndPort &, mongo::BSONObjBuilder *)
|