-
Type: Bug
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Cluster Scalability
-
ALL
-
Cluster Scalability Priorities
-
0
Inside a retryable internal transaction, only the statements with stmtIds are retryable. In other words, for a retryable internal transaction used to executed writes for a particular statement in a retryable write command, only first write statement is retryable. The contract is that the owner of the transaction must not execute the subsequent write statements if the response for the first statement contains "retriedStmtIds" which indicates that the statement had already been executed.
Consider a sharded cluster with shard0 and shard1.
- Let's say a client runs a write command in a session with retryWrites: true against mongos0. The retryable write translates to with two write statements w0 against shard0 and w1 against shard1 that need to be executed atomically (in a retryable internal transaction).
- w0 is retryable and has "stmtId": 0.
- w1 is not retryable and has "stmtId": -1.
- While the retryable internal transaction is committing (using two-phase commit), the client loses the connection to mongos0. So it retries the write command against mongos1.
- The retry causes mongos1 to start a retryable internal transaction for the retry. Let's say w1 arrives on shard0 right after shard0 has committed the transaction but shard1 has not. The session was also successfully checked out on shard0:prim since it is no active operation. The transaction had already been committed on shard0 so the retry didn't get a RetryableTransactionInProgress error and get blocked here. Due to the history for the transaction started in step (1), the checkStatementExecuted() check for w0 indicated the the statement had already been executed so it responds with "retriedStmtIds" : [ 0 ]. This causes mongos1 to conclude that this was a retry so it must not execute w1 and instead can return early immediately.
- If the client performs a read immediately after this without providing afterClusterTime >= commitTimestamp (i.e. before shard1 commits the transaction), it would not see w2. This breaks reads-your-own-writes.
In short, for a retryable internal transaction that involves writing to multiple shards and has short-circuiting, read-your-own-writes is only guaranteed in sessions with "causalConsistency" enabled.
- related to
-
SERVER-99782 Read-your-own-writes guarantee for a cross-shard transaction if the client recovers the decision from the recovery shard that is not also a coordinator shard
- Needs Scheduling