[SERVER-60369] [txnRetryCounter] Use txnRetryCounter on router to retry on some transient errors in a client transaction Created: 30/Sep/21  Updated: 17/Nov/23

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Jack Mulrow Assignee: Backlog - Cluster Scalability
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Cluster Scalability
Sprint: Sharding 2022-07-25, Sharding 2022-08-08, Sharding 2022-08-22, Sharding 2022-09-05, Sharding 2022-09-19, Sharding 2022-10-03, Sharding 2022-10-17, Sharding NYC 2022-10-31
Participants:

 Description   

Currently, if any shard participant in a sharded transaction returns a transient error, the error is returned to the client, which will retry after incrementing the transaction number. With the introduction of txnRetryCounter in SERVER-58752, the router running the transaction can instead catch transient errors in some situations and retry with a higher txnRetryCounter without involving the client, improving performance.

 

In this ticket, mongos will retry on all transient transaction errors encountered during the first stmt id, regardless of the number of participants involved. The mongos will start a new transaction with an incremented txnRetryCounter value, and the mongod will consider this request as a new transaction.

 

Note, we can revert this commit to readd support for the txnRetryCounter and that there are some tests that will need to be re-enabled, see comments below. 


Generated at Thu Feb 08 05:49:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.