[SERVER-55557] Range deletion of aborted migration can fail after a refine shard key Created: 26/Mar/21 Updated: 29/Oct/23 Resolved: 28/May/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 5.0.4, 5.1.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Jordi Serra Torrens | Assignee: | Jordi Serra Torrens |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | sharding-wfbf-day | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Backport Requested: |
v5.0
|
||||||||||||||||
| Sprint: | Sharding EMEA 2021-05-31 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Linked BF Score: | 27 | ||||||||||||||||
| Description |
|
At the end of _configSvrRefineCollectionShardKey it triggers a best-effort fire-and-forget refresh to the shards that own chunks. It's best effort, so it is not guaranteed that the shards will actually refresh. Consider a shard that had cached metadata for the collection, but had not successfully refreshed after the refineCollectionShardKey. If this shard is later a recipient of a chunk migration that gets aborted, when this shard goes to execute the range deletion, it will believe the collection still has the old shard key. However, the range boundaries in the task are with the new refined shard key. So this call to KeyPattern::extendRangeBound will fail here . |
| Comments |
| Comment by Githook User [ 27/Oct/21 ] |
|
Author: {'name': 'Jordi Serra Torrens', 'email': 'jordi.serra-torrens@mongodb.com', 'username': 'jordist'}Message: |
| Comment by Vivian Ge (Inactive) [ 06/Oct/21 ] |
|
Updating the fixversion since branching activities occurred yesterday. This ticket will be in rc0 when it’s been triggered. For more active release information, please keep an eye on #server-release. Thank you! |
| Comment by Githook User [ 28/May/21 ] |
|
Author: {'name': 'Jordi Serra Torrens', 'email': 'jordi.serra-torrens@mongodb.com', 'username': 'jordist'}Message: |
| Comment by Esha Maharishi (Inactive) [ 05/Apr/21 ] |
|
This is a duplicate of |
| Comment by Jordi Serra Torrens [ 26/Mar/21 ] |
|
A couple of alternatives on how to address this: a) We could catch this error and refresh the metadata, so the next time this range deletion task is retried it will know of the new shard key. b) We could make the shards refresh triggered by _configsvrRefineCollectionShardKey be for correctness (instead of best-effort fire-and-forget), and ensure that the shards successfully refresh and flush the refresh with majority write concern before returning from _configsvrRefineCollectionShardKey |