Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-45845

TransactionCoordinator stepUp can deadlock

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Sharding
    • None
    • Sharding
    • ALL

      Scenario:

      • a transaction on ns foo.bar is on prepare
      • new primary just stepped up on this shard

      Sequence of events to deadlock:
      1. The new primary's TransactionParticipants make sure necessary locks are acquired for the prepared txn.
      2. An operation makes a write, generating a new oplog and advancing last op timestamp.
      3. An operation requiring a conflicting exclusive lock arrives on the new primary.
      4. Multiple operations conflicting with the exclusive lock also arrives, blocking behind the lock request of operation in #3. The numbers came in enough to exhaust the read ticket.
      5. TransactionCoordinatorService stepUp code kicks in, tries to wait for last op to become majority committed.
      6. Secondaries try to fetch oplog from new primary but can't query the primary because the read ticket is already exhausted. So majority timestamp won't advance.
      7. Retried CoordinatorCommit command for the prepared transaction arrives tries to wait for TransactionCoordinatorService to fully step up before proceeding. Deadlock occurs. Also note that TransactionCoordinatorService will also try to start coordinating in progress coordinators after waiting for majority.

            Assignee:
            backlog-server-sharding [DO NOT USE] Backlog - Sharding Team
            Reporter:
            randolph@mongodb.com Randolph Tan
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated:
              Resolved: