Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-45845

TransactionCoordinator stepUp can deadlock

    XMLWordPrintableJSON

Details

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Major - P3 Major - P3
    • None
    • None
    • Sharding
    • None
    • Sharding
    • ALL

    Description

      Scenario:

      • a transaction on ns foo.bar is on prepare
      • new primary just stepped up on this shard

      Sequence of events to deadlock:
      1. The new primary's TransactionParticipants make sure necessary locks are acquired for the prepared txn.
      2. An operation makes a write, generating a new oplog and advancing last op timestamp.
      3. An operation requiring a conflicting exclusive lock arrives on the new primary.
      4. Multiple operations conflicting with the exclusive lock also arrives, blocking behind the lock request of operation in #3. The numbers came in enough to exhaust the read ticket.
      5. TransactionCoordinatorService stepUp code kicks in, tries to wait for last op to become majority committed.
      6. Secondaries try to fetch oplog from new primary but can't query the primary because the read ticket is already exhausted. So majority timestamp won't advance.
      7. Retried CoordinatorCommit command for the prepared transaction arrives tries to wait for TransactionCoordinatorService to fully step up before proceeding. Deadlock occurs. Also note that TransactionCoordinatorService will also try to start coordinating in progress coordinators after waiting for majority.

      Attachments

        Activity

          People

            backlog-server-sharding [DO NOT USE] Backlog - Sharding Team
            randolph@mongodb.com Randolph Tan
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: