[SERVER-78104] AsyncRequestsSender should return final inconclusive error if it exhausts its retries Created: 14/Jun/23 Updated: 05/Feb/24 |
|
| Status: | In Code Review |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Jack Mulrow | Assignee: | Kshitij Gupta |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | jepsen-retro-2023, sharding-nyc-subteam2 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Sharding NYC
|
||||||||
| Operating System: | ALL | ||||||||
| Backport Requested: |
v7.3
|
||||||||
| Sprint: | Sharding NYC 2023-06-26, Sharding NYC 2023-07-10, Sharding NYC 2023-07-24, Sharding NYC 2023-08-07, Sharding NYC 2023-08-21, Sharding NYC 2023-09-04, Sharding NYC 2023-09-18, Sharding NYC 2023-10-02, Sharding NYC 2023-10-16, Sharding NYC 2023-10-30, Cluster Scalability 2023-11-13, Cluster Scalability 2023-11-27, Cluster Scalability 2023-12-11, Cluster Scalability 2023-12-25, Cluster Scalability 2024-1-8, Cluster Scalability 2024-1-22, Cluster Scalability 2024-2-5, Cluster Scalability 2024-2-19 | ||||||||
| Participants: | |||||||||
| Linked BF Score: | 0 | ||||||||
| Story Points: | 3 | ||||||||
| Description |
|
The AsyncRequestsSender (ARS) will retry remote requests according to the retry policy its given up to 3 times and will return the final attempt's response if it exhausts this retry limit. For retrying, it considers top-level command errors and write concern errors the same. Some retriable errors mean a write definitely didn't occur for that attempt (like a NotWritablePrimary command error), and others are inconclusive (like a network error or a retryable write concern error). This behavior means the ARS may receive an inconclusive error on some attempt, retry until exhausting retries, then return a definitive error, leading a higher layer client to believe the write definitively did not occur, when it may have. A possible fix is for the ARS to remember if it receives an inconclusive response during its retries and return the final inclusive error it has seen (if there was one), instead of whatever the most recent attempt failed with. This is essentially a subset of SERVER-69295 without mongos also returning the NoWritesPerformed error label. |