-
Type: Task
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Sharding
-
Fully Compatible
-
Sharding 2019-01-28, Sharding 2019-02-11
-
17
When mongos encounters an error during a transaction, if the error is "retryable" (e.g. snapshot error on first client statement), it will remove newly added participants from the participant list and retry the request, relying on shards implicitly aborting transactions started for the first attempt before servicing the new one.
If the operation on the router is killed (e.g. by killOp) after it clears pending participants but before it re-targets, the router will not know to send abort to the shards targeted by the first attempt, which may leave transactions open. To handle this and to simplify the contract around router retries, the router should instead send abortTransaction to all shards it removes from the participant list (waiting for all responses) before retrying. The ability for a shard to start a new transaction at the same number as an in-progress one should also be removed.