Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 4.1.9
Affects Version/s: None
Component/s: Sharding
Labels:
- ShardedTxn:RouterSupport

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Sprint:
Sharding 2019-03-11
Linked BF Score:
20
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

While running a sharded transaction, the router tracks in-memory each shard that has been involved in the transaction. When a new shard is targeted by a statement, the router adds it to the participant list as a pending participant and attaches startTransaction=true and the active transaction's txnId to the next request sent to that shard.

The shard is considered "pending" because of the shard version version protocol, which means the router won't know the shard was able to satisfy the request sent to it until it gets an OK response. To handle the case where a pending participant returns a stale version error, the router will abort the active txnId on each pending participant (not just the one that returned the error), wait for each to respond, then retry the current statement, possibly targeting a different set of shards. This allows the router to handle these errors within a transaction, instead of returning a transient error and making the client retry with a new txnId.

To enable this behavior, shards can accept a request with startTransaction=true more than once for a txnId, but only if the shard's local transaction is in the aborted state. This relies on the abort sent to every pending participant before retrying reaching each one after the first requests with startTransaction=true, which is not guaranteed in an asynchronous network. If the abort arrives before the first request on a shard and that shard is targeted by the retry, the retry will be rejected by that shard because it will have an in-progress transaction at that txnId.

related to

SERVER-39704 Allow mongos to retry on stale version and snapshot errors within a transaction

Backlog

Assignee:: Jack Mulrow
Reporter:: Jack Mulrow
Participants:: Githook User, Jack Mulrow
Votes:: 0 Vote for this issue
Watchers:: 6 Start watching this issue

Created:: Feb 15 2019 09:29:54 PM UTC
Updated:: Oct 29 2023 10:23:56 PM UTC
Resolved:: Feb 26 2019 03:44:05 PM UTC
Confidence Status Last Update:: 22/Feb/19 2:59 PM

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates