Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-28840

replSetSyncFrom causes InitialSyncer and ReplicationCoordinator to acquire each other's mutexes in opposite orders

    • Replication
    • ALL

      This issue was originally discovered by the Coverity Static Analysis tool.

      Consider the following lock acquisitions in InitialSyncer and ReplicationCoordinatorImpl:

      ReplicationCoordinatorImpl::processReplSetSyncFrom
      1. Acquire ReplicationCoordinatorImpl::_mutex code
      2. Acquire InitialSyncer::_mutex code
      InitialSyncer::_multiApplierCallback
      1. Acquire InitialSyncer::_mutex code
      2. Acquire ReplicationCoordinatorImpl::_mutex code

      Since these two functions acquire the same two locks but in reverse orders, it creates the potential for a deadlock, if each of these functions are running concurrently. One way to fix this would be to stop InitialSyncer from updating the optime of the ReplicationCoordinator on every batch. Alternatively, the _multiApplierCallback could call the _opts.setLastOpTime outside of holding it's own mutex, since it doesn't seem necessary to synchronize access to the InitialSyncer::_lastApplied after it's been written to in that function.

      This issue also occurs in InitialSyncer::_getNextApplierBatchCallback, which acquires the InitialSyncer mutex, and then tries to acquire ReplicationCoordinator's mutex when calling _opts.getSlaveDelay().


      Original Coverity Report Message:

      Defect 100780 (STATIC_C)
      Checker ORDER_REVERSAL (subcategory none)
      File: /src/mongo/db/repl/replication_coordinator_impl.cpp
      Function mongo::repl::ReplicationCoordinatorImpl::processReplSetSyncFrom(mongo::OperationContext *, const mongo::HostAndPort &, mongo::BSONObjBuilder *)

            Assignee:
            backlog-server-repl [DO NOT USE] Backlog - Replication Team
            Reporter:
            xgen-internal-coverity Coverity Collector User
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: