-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
None
-
Fully Compatible
-
ALL
-
Sharding NYC 2022-04-18
Resetting the RetryableWriteTransactionParticipantCatalog upon starting an internal transaction for a non-retryable write leads to an invariant failure in the following case.
Consider a sharded collection with two chunks: chunk0 and chunk1, which reside on shard0 and shard1, respectively.
- The client starts a retryableWrite=false session S and performs a write against chunk0 in a transaction T0 with txnNumber0.
- chunk0 is moved from shard0 to shard1. During the migration, shard1 writes a dead-end sentinel noop oplog entry for T0.
- chunk0 is moved from shard1 back to shard0.
- The client performs a write against chunk1 in the session S outside a transaction. The write needs to executed using a transaction so the router executes it in an internal transaction for non-retryable write T1. Upon starting T1, shard1 resets/invalidates the RetryableWriteTransactionParticipantCatalog for S.
- chunk0 is moved from shard0 to shard1 again. During the migration, shard1 hits this invariant when processing the dead-end sentinel noop oplog entry for T0. Note that beginOrContinue() does not fail on shard1 with InCompleteTransactionHistory because from the perspective of shard1, txnNumber0 has always corresponded to a retryable write because of the dead-end sentinel oplog entry it wrote in step 2.