[SERVER-78104] AsyncRequestsSender should return final inconclusive error if it exhausts its retries Created: 14/Jun/23  Updated: 05/Feb/24

Status: In Code Review
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Jack Mulrow Assignee: Kshitij Gupta
Resolution: Unresolved Votes: 0
Labels: jepsen-retro-2023, sharding-nyc-subteam2
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Assigned Teams:
Sharding NYC
Operating System: ALL
Backport Requested:
v7.3
Sprint: Sharding NYC 2023-06-26, Sharding NYC 2023-07-10, Sharding NYC 2023-07-24, Sharding NYC 2023-08-07, Sharding NYC 2023-08-21, Sharding NYC 2023-09-04, Sharding NYC 2023-09-18, Sharding NYC 2023-10-02, Sharding NYC 2023-10-16, Sharding NYC 2023-10-30, Cluster Scalability 2023-11-13, Cluster Scalability 2023-11-27, Cluster Scalability 2023-12-11, Cluster Scalability 2023-12-25, Cluster Scalability 2024-1-8, Cluster Scalability 2024-1-22, Cluster Scalability 2024-2-5, Cluster Scalability 2024-2-19
Participants:
Linked BF Score: 0
Story Points: 3

 Description   

The AsyncRequestsSender (ARS) will retry remote requests according to the retry policy its given up to 3 times and will return the final attempt's response if it exhausts this retry limit. For retrying, it considers top-level command errors and write concern errors the same.

Some retriable errors mean a write definitely didn't occur for that attempt (like a NotWritablePrimary command error), and others are inconclusive (like a network error or a retryable write concern error). This behavior means the ARS may receive an inconclusive error on some attempt, retry until exhausting retries, then return a definitive error, leading a higher layer client to believe the write definitively did not occur, when it may have.

A possible fix is for the ARS to remember if it receives an inconclusive response during its retries and return the final inclusive error it has seen (if there was one), instead of whatever the most recent attempt failed with. This is essentially a subset of SERVER-69295 without mongos also returning the NoWritesPerformed error label.


Generated at Thu Feb 08 06:37:28 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.