-
Type: Bug
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Sharding
-
None
-
Sharding
-
ALL
-
(copied to CRM)
Scenario:
- a transaction on ns foo.bar is on prepare
- new primary just stepped up on this shard
Sequence of events to deadlock:
1. The new primary's TransactionParticipants make sure necessary locks are acquired for the prepared txn.
2. An operation makes a write, generating a new oplog and advancing last op timestamp.
3. An operation requiring a conflicting exclusive lock arrives on the new primary.
4. Multiple operations conflicting with the exclusive lock also arrives, blocking behind the lock request of operation in #3. The numbers came in enough to exhaust the read ticket.
5. TransactionCoordinatorService stepUp code kicks in, tries to wait for last op to become majority committed.
6. Secondaries try to fetch oplog from new primary but can't query the primary because the read ticket is already exhausted. So majority timestamp won't advance.
7. Retried CoordinatorCommit command for the prepared transaction arrives tries to wait for TransactionCoordinatorService to fully step up before proceeding. Deadlock occurs. Also note that TransactionCoordinatorService will also try to start coordinating in progress coordinators after waiting for majority.
- depends on
-
SERVER-45953 Exempt oplog readers from acquiring read tickets
- Closed
- duplicates
-
SERVER-45953 Exempt oplog readers from acquiring read tickets
- Closed