[SERVER-39624] Reordering of transaction requests from mongos can lead shards to unexpectedly have in-progress transactions Created: 15/Feb/19  Updated: 29/Oct/23  Resolved: 26/Feb/19

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 4.1.9

Type: Bug Priority: Major - P3
Reporter: Jack Mulrow Assignee: Jack Mulrow
Resolution: Fixed Votes: 0
Labels: ShardedTxn:RouterSupport
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Problem/Incident
Related
related to SERVER-39704 Allow mongos to retry on stale versio... Needs Scheduling
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Sharding 2019-03-11
Participants:
Linked BF Score: 20

 Description   

While running a sharded transaction, the router tracks in-memory each shard that has been involved in the transaction. When a new shard is targeted by a statement, the router adds it to the participant list as a pending participant and attaches startTransaction=true and the active transaction's txnId to the next request sent to that shard.

The shard is considered "pending" because of the shard version version protocol, which means the router won't know the shard was able to satisfy the request sent to it until it gets an OK response. To handle the case where a pending participant returns a stale version error, the router will abort the active txnId on each pending participant (not just the one that returned the error), wait for each to respond, then retry the current statement, possibly targeting a different set of shards. This allows the router to handle these errors within a transaction, instead of returning a transient error and making the client retry with a new txnId.

To enable this behavior, shards can accept a request with startTransaction=true more than once for a txnId, but only if the shard's local transaction is in the aborted state. This relies on the abort sent to every pending participant before retrying reaching each one after the first requests with startTransaction=true, which is not guaranteed in an asynchronous network. If the abort arrives before the first request on a shard and that shard is targeted by the retry, the retry will be rejected by that shard because it will have an in-progress transaction at that txnId.



 Comments   
Comment by Githook User [ 26/Feb/19 ]

Author:

{'name': 'Jack Mulrow', 'email': 'jack.mulrow@mongodb.com', 'username': 'jsmulrow'}

Message: SERVER-39624 Allow finds to fail with StaleChunkHistory in snapshot_read_catalog_operations.js
Branch: master
https://github.com/mongodb/mongo/commit/76bea9daaf67deee934c50e6ebcb4e5d5bbb397e

Comment by Jack Mulrow [ 26/Feb/19 ]

A note from the review on why the router can still retry on the view resolution error even when the fail point isn't set:

I didn't disable retries on the ViewResolution error because otherwise it would be impossible to read from a view in a transaction. I think this is fine though, because requests that can return this error must have been sent to a single shard (the primary for the view's database), so the router does not need to send abort before retrying on the view resolution error from a pending participant, which avoids the problem in the description.

Comment by Githook User [ 26/Feb/19 ]

Author:

{'name': 'Jack Mulrow', 'email': 'jack.mulrow@mongodb.com', 'username': 'jsmulrow'}

Message: SERVER-39624 Put internal router retries for stale version and snapshot errors behind a fail point
Branch: master
https://github.com/mongodb/mongo/commit/b0d3c0d2934a2096ba27362db22fe768b5341485

Comment by Jack Mulrow [ 20/Feb/19 ]

As discussed at the sharded transactions standup, we'll fix this by preventing mongos from retrying on stale shard/db version and snapshot errors within a transaction at all and having it return a TransientTransactionError label instead, so the client will retry the whole transaction at a higher txnNumber. This should allow sharded transactions to work in all cases without much work. My proposed approach above will instead be considered as part of the Optimize Cross Shard Transactions epic in SERVER-39704.

Comment by Jack Mulrow [ 15/Feb/19 ]

Since the fundamental problem is that mongos can send multiple requests to the same shard with startTransaction=true for a txnId, we might be able to fix this by adding another field to the txnId that is generated by mongos and transparent to the user. Then each time mongos sends a shard startTransaction=true, it can generate a new value for this field (transaction version?) that shards can use to distinguish the earlier attempts. This field could be scoped to a particular txnNumber, so the existing machinery for client retries on transient transaction errors would still work, i.e. any comparison of txnIds always compares the txnNumber before this field.

If we include each shard's expected version in the participant list and send it with prepare/commit (or every request within a transaction), we can abort the entire transaction if any shard has an unexpected version and return a transient transaction error label so the client can retry with a higher txnNumber, since this should only happen because of reordered messages if the client operates correctly.

We could even get rid of the aborts between statement retries by making shards treat a higher transaction version the same as a higher txnNumber and overwrite any state from a previous attempt.

Generated at Thu Feb 08 04:52:36 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.