[SERVER-67609] Sharded time-series insert can return Interrupted after a stepdown (delayed BF-investigation) Created: 28/Jun/22 Updated: 29/Oct/23 Resolved: 12/Apr/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 7.1.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Dan Larkin-York | Assignee: | Dianna Hohensee (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Storage Execution
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Sprint: | Execution Team 2023-03-06, Execution Team 2023-03-20, Execution Team 2023-04-17 | ||||||||
| Participants: | |||||||||
| Linked BF Score: | 4 | ||||||||
| Description |
|
It looks like a time-series insert can return a plain 'Interrupted' error code after a stepdown (at least in certain circumstances) rather than the expected 'InterruptedDueToReplStateChange', resulting in the operation not being retried. This resulted in BF-25238. We have a band-aid fix ( In the continuous stepdown suites, you get the 'Interrupted' error code with a default error message. In the kill_primary suites, you get the 'Interrupted' error code with a more descriptive error message about read preference, like "Write results unavailable from failing to target a host in the shard shard-rs0 :: caused by :: Could not find host matching read preference { mode: \"primary\" } for set shard-rs0". |
| Comments |
| Comment by Githook User [ 12/Apr/23 ] | ||||||||||||||||||||||||||||
|
Author: {'name': 'Dianna Hohensee', 'email': 'dianna.hohensee@mongodb.com', 'username': 'DiannaHohensee'}Message: | ||||||||||||||||||||||||||||
| Comment by Dan Larkin-York [ 15/Sep/22 ] | ||||||||||||||||||||||||||||
|
arun.banala@mongodb.com That does help narrow it down. I can dig into the time-series-specific write paths a bit and see if there's anything that can explain it. Thanks for taking a look! | ||||||||||||||||||||||||||||
| Comment by Sergi Mateo Bellido [ 01/Jul/22 ] | ||||||||||||||||||||||||||||
|
Passing this ticket to the query-execution team since they own this part. Feel free to reach me if you believe that the problem is sharding-related, I spent a bunch of hours but I didn't find any interesting lead. Thank you! | ||||||||||||||||||||||||||||
| Comment by Sergi Mateo Bellido [ 01/Jul/22 ] | ||||||||||||||||||||||||||||
|
I took a look at the two kind of BFGs that we have:
|