[SERVER-66046] Resharding coordinator won't automatically abort the resharding operation when a recipient shard errors during its applying phase Created: 28/Apr/22 Updated: 29/Oct/23 Resolved: 08/Jun/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 5.3.0, 5.0.0, 6.0.0-rc3 |
| Fix Version/s: | 5.0.10, 6.0.0-rc10, 6.1.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Max Hirschhorn | Assignee: | Nandini Bhartiya |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | sharding-nyc-subteam1 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Backport Requested: |
v6.0, v5.0
|
||||||||||||||||
| Sprint: | Sharding NYC 2022-05-30, Sharding NYC 2022-06-13 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Story Points: | 3 | ||||||||||||||||
| Description |
|
While the recipient shards are in RecipientStateEnum::kApplying, they will continuously fetch oplog entries from writes on the donor shards and apply them. If there's a operation-fatal error while applying an oplog entries, the recipient shard will transition to RecipientStateEnum::kError and inform the coordinator shard.
While the recipient shards are in RecipientStateEnum::kApplying, the coordinator shard is monitoring for an opportune moment to commit the resharding operation based on how caught up the recipient shards are to the writes on the donor shards. The coordinator shard won't realize that the recipient shards will never reach an opportune time to commit because the resharding operation must abort.
An operator can manually issue the abortReshardCollection command for the operation to cancel the resharding operation. |
| Comments |
| Comment by Githook User [ 09/Jun/22 ] |
|
Author: {'name': 'nandinibhartiyaMDB', 'email': 'nandini.bhartiya@mongodb.com', 'username': 'nandinibhartiyaMDB'}Message: |
| Comment by Githook User [ 09/Jun/22 ] |
|
Author: {'name': 'nandinibhartiyaMDB', 'email': 'nandini.bhartiya@mongodb.com', 'username': 'nandinibhartiyaMDB'}Message: (cherry picked from commit f016b1053908e031dbcec48ffb0a30fa63ba7e3d) |
| Comment by Githook User [ 08/Jun/22 ] |
|
Author: {'name': 'nandinibhartiyaMDB', 'email': 'nandini.bhartiya@mongodb.com', 'username': 'nandinibhartiyaMDB'}Message: |
| Comment by Max Hirschhorn [ 09/May/22 ] |
|
We think a possible solution would be to do whenAny(_canEnterCritical.getFuture(), _reshardingCoordinatorObserver->awaitAllRecipientsInStrictConsistency()) to fail early when a recipient shard will never reach strict consistency after the commit monitor has been started. |