Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-46704

Two phase index build can violate locking ordering and can lead to deadlocks.

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 4.4.0-rc0, 4.7.0
    • Affects Version/s: None
    • Component/s: Storage
    • None
    • Fully Compatible
    • ALL
    • v4.4
    • Execution Team 2020-03-09, Execution Team 2020-03-23
    • 49

      Currently, IndexBuildsCoordinatorMongod::voteCommitIndexBuild() violates the lock ordering, i.e., it tries to acquire RSTL lock in mode IX with ReplIndexBuildState::mutex held. As a result, it can deadlock with stepup code path (ReplicationCoordinatorImpl::signalDrainComplete), as it acquires RSTL lock in X mode first, and then tries to send abort or commit signal to index build by holding ReplIndexBuildState::mutex.

      Note:
      The ticket also address 3 more issues.
      1) Currently, the index build (internal system thread) holds RSTl lock with uninterruptible guard enabled. And, it blocks replication state transition, like, step up, step down. (SERVER-44045)

      2) We are acquiring collection lock in stronger mode (mode X) in order to commit or abort. As, a result, this can lead to deadlocks involving prepared transactions, stepdown and indexBuildsCoordinator. (SERVER-44722)

      3) Currently IndexBuildsCoordinatorMongod::_waitForNextIndexBuildAction() holds RSTL only for the while loop scope. As a result, the primary check that we are doing at this line, can no longer be valid. (SERVER-46989)

      4) Also, index build retries to vote on error without checking any interrupts, like, shutdown interrupts. This makes shutdown to hang forever, as it waits for the index builds to complete.

      UPDATE: This ticket won't address the 3 additional issues. And it will be addressed separately.

            Assignee:
            suganthi.mani@mongodb.com Suganthi Mani
            Reporter:
            suganthi.mani@mongodb.com Suganthi Mani
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: