[SERVER-78683] Handling errors due to stepdown correctly for internal transaction api Created: 05/Jul/23  Updated: 10/Jan/24

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Yuhong Zhang Assignee: Jack Mulrow
Resolution: Unresolved Votes: 0
Labels: sharding-nyc-subteam2
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Assigned Teams:
Sharding NYC
Operating System: ALL
Sprint: Sharding NYC 2023-08-21, Sharding NYC 2023-09-04, Sharding NYC 2023-09-18, Sharding NYC 2023-10-02, Sharding NYC 2023-10-16, Sharding NYC 2023-10-30, Cluster Scalability 2023-11-13, Cluster Scalability 2023-11-27
Participants:
Linked BF Score: 5

 Description   

We have seen errors from operations using internal transaction api during stepdowns like

Attempted to run 'update' as a retryable write with session idbaa46b88-bdb2-4ca6-8b6a-7b60dce7e840 - 47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU= -  -  and transaction number 13 but the active transaction number on the session is 12 

From a discussion with Jack:

"Transactions can fail with certain errors like because of stepdowns, but they should be considered “transient errors” and the txn API should retry automatically on them. I think the problem is that particular error isn’t considered transient since we’re unstashing as a retryable write, which feels wrong."


Generated at Thu Feb 08 06:38:59 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.