[DRIVERS-926] Consider making ReadConcernMajorityNotAvailableYet a retryable error Created: 09/Mar/20 Updated: 04/Dec/23 |
|
| Status: | Implementing |
| Project: | Drivers |
| Component/s: | Retryability |
| Fix Version/s: | None |
| Type: | Spec Change | Priority: | Major - P3 |
| Reporter: | Pavithra Vetriselvan | Assignee: | Kyle Kloberdanz |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| Driver Changes: | Needed | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Quarter: | FY24Q4 | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| Downstream Changes Summary: | Summary of necessary driver changes
Commits for syncing spec/prose tests |
||||||||||||||||||||||||||||||||||||||||||||||||||||
| Engineering Lead: | |
||||||||||||||||||||||||||||||||||||||||||||||||||||
| Start date: | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| Driver Compliance: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
This came up during testing for Safe Replica Set Reconfig. During a safe reconfig, the primary will drop snapshots after writing down a new config document. If a read is issued on this node before it updates its snapshot, the server fails with ReadConcernMajorityNotAvailableYet. The node should eventually be able to update the committed snapshot through heartbeats (2 second interval), so the read will eventually succeed. It seems like we should treat this as a retryable error. |
| Comments |
| Comment by Githook User [ 01/Dec/23 ] |
|
Author: {'name': 'Kyle Kloberdanz', 'email': 'kyle.kloberdanz@mongodb.com', 'username': 'kkloberdanz'}Message: DRIVERS-926 Make ReadConcernMajorityNotAvailableYet a retryable read error (#1479) |
| Comment by Shane Harvey [ 30/Nov/23 ] |
|
Making ReadConcernMajorityNotAvailableYet a retryable read error makes sense to me. |
| Comment by Pavithra Vetriselvan [ 10/Mar/20 ] |
|
Oh, got it! Thanks for clarifying. Let me know if you have any more questions about the server's behavior. |
| Comment by Shane Harvey [ 09/Mar/20 ] |
|
Sorry, I'm not trying to say that we shouldn't retry here. Just trying to point out that our current retry logic may be insufficient to actually address this scenario in practice. |
| Comment by Pavithra Vetriselvan [ 09/Mar/20 ] |
|
Hmm, I see. The reconfig doesn't cause a state transition, so the node will continue to report itself as primary. Your comment helps explain why we didn't run into this issue with rollback dropping snapshots. We kill any in progress reads before transitioning to rollback and fail with "InterruptedDueToReplStateChange." We also don't allow any new reads during this state. I did not realize that the drivers only retry once, that's definitely a good point. The read should eventually succeed if we retry enough, but isn't guaranteed to immediately succeed upon one retry. I'm curious, what is the user expected to do when receiving a "ReadConcernMajorityNotAvailableYet" error? |
| Comment by Shane Harvey [ 09/Mar/20 ] |
If a read fails with ReadConcernMajorityNotAvailableYet does that mean the node is in an unknown SDAM state? (Ie, does the primary stop reporting itself as primary in isMaster responses?) If the answer is no, then it seems like the retry will most likely immediately proceed (without blocking in server selection) and fail with the same error. If this is the case then it seems like retrying simply delays the error. Note that drivers only retry once. |