[SERVER-58021] mongos should retry write upon getting ShardCannotRefreshDueToLocksHeld error from shard Created: 23/Jun/21 Updated: 29/Oct/23 Resolved: 21/Jul/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 5.0.2, 5.1.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Jordi Serra Torrens | Assignee: | Jordi Serra Torrens |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | sharding-wfbf-day | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Backport Requested: |
v5.0
|
||||||||
| Sprint: | Sharding EMEA 2021-07-26 | ||||||||
| Participants: | |||||||||
| Linked BF Score: | 0 | ||||||||
| Description |
|
Shards retry ShardCannotRefreshDueToLocksHeld locally one single time after refreshing the catalog cache, but if on the retry it hits the same error (because the collection metadata was invalidated again after the refresh) then the ShardCannotRefreshDueToLocksHeld error is propagated to the router. The router should retry on this error, but it doesn't because it isn't in this list of errors to be retried on write_op |
| Comments |
| Comment by Vivian Ge (Inactive) [ 06/Oct/21 ] |
|
Updating the fixversion since branching activities occurred yesterday. This ticket will be in rc0 when it’s been triggered. For more active release information, please keep an eye on #server-release. Thank you! |
| Comment by Githook User [ 21/Jul/21 ] |
|
Author: {'name': 'Jordi Serra Torrens', 'email': 'jordi.serra-torrens@mongodb.com', 'username': 'jordist'}Message: (cherry picked from commit 05566100dd69d8f7c1e48fa67ec1533d04418b1a) |
| Comment by Githook User [ 21/Jul/21 ] |
|
Author: {'name': 'Jordi Serra Torrens', 'email': 'jordi.serra-torrens@mongodb.com', 'username': 'jordist'}Message: |
| Comment by Blake Oler [ 09/Jul/21 ] |
|
Could we bump the priority of this? I'm running into it trying to stand up resharding concurrency tests. jordi.serra-torrens |