[SERVER-63887] SnapshotUnavailable error on sharded clusters/replica sets Created: 22/Feb/22 Updated: 29/Oct/23 Resolved: 29/Mar/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 7.0.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Neil Shweky (Inactive) | Assignee: | Henrik Edin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||
| Assigned Teams: |
Storage Execution
|
||||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||||||||
| Steps To Reproduce: | Note that I haven't been able to reproduce this locally, only on CI. Here is the patch that I ran. The code I'm using is as follows:
|
||||||||||||||||||||||||||||||||||||||
| Sprint: | Execution Team 2023-04-03 | ||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||
| Description |
SummaryI was implementing DRIVERS-2181 in Ruby, and these tests pass locally but fail on CI (but only very occasionally). This ticket is for implementing snapshot query example tests. When running these tests on CI (it passes locally) I occasionally get the following error on both replica sets and sharded clusters. Something like this error has been reported before. See The error is as follows:
Note that sending distinct commands seems to fix this for sharded clusters. See the comments under SERVER-39704. This still fails for replica sets. MotivationWho is the affected end user?mongo-ruby-driver spec tests are failing How does this affect the end user?I'm not sure that it does, since I'm having a hard time reproducing it How likely is it that this problem or use case will occur?It doesn't seem very likely since I'm having a hard time reproducing it. If the problem does occur, what are the consequences and how severe are they?The snapshot fails, but it seems like if you retry it, it will work. See
|
| Comments |
| Comment by Jeremy Mikola [ 29/Mar/22 ] |
|
Thinking about how we can mitigate this possible pain point for applications, I wonder if it'd be reasonable to add SnapshotUnavailable(246) to the list of retryable errors, so retryable reads could kick in. The current implementation would only afford us one additional retry attempt, which may not be sufficient, but that may change with client-side operation timeout (DRIVERS-555). |