[SERVER-58592] Make ReshardingCoordinatorService more robust when stepdowns happen near the end of a resharding operation. Created: 15/Jul/21 Updated: 29/Oct/23 Resolved: 03/Aug/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 5.0.4, 5.1.0-rc0 |
| Type: | New Feature | Priority: | Major - P3 |
| Reporter: | Kshitij Gupta | Assignee: | Randolph Tan |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | PM-234-M3, PM-234-T-lifecycle | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Backport Requested: |
v5.0
|
||||||||
| Sprint: | Sharding 2021-07-26, Sharding 2021-08-09 | ||||||||
| Participants: | |||||||||
| Linked BF Score: | 120 | ||||||||
| Story Points: | 2 | ||||||||
| Description |
|
In our current implemention for the resharding coordinator, when resharding is done, we first remove the on-disk coordinator document and then clean the in-memory state (i.e completing/stepping down the metrics). This can cause issues. Consider the case in the BF. There is a stepdown after the coordinator document has been deleted but before the in-memory state has been cleaned. Since the coordinator document has been deleted, this instance is removed from the _activeInstances map in PrimaryOnlyService by the PrimaryOnlyServiceOpObserver. After this config server primary (referred to as primary_1 from here) steps down, a new primary will stepup. Since the old document and instance was deleted, this new primary won't resume the same resharding operation and will wait for the next resharding operation. When primary_1 steps up again as a primary, it will still have the not cleaned in-memory state from the original resharding operation which will conflict with the in-memory state of any new resharding operation. |
| Comments |
| Comment by Vivian Ge (Inactive) [ 06/Oct/21 ] |
|
Updating the fixversion since branching activities occurred yesterday. This ticket will be in rc0 when it’s been triggered. For more active release information, please keep an eye on #server-release. Thank you! |
| Comment by Githook User [ 22/Sep/21 ] |
|
Author: {'name': 'Randolph Tan', 'email': 'randolph@10gen.com', 'username': 'renctan'}Message: (cherry picked from commit a33a04b6186ea5b56c1c9228ed19c41061f80749) |
| Comment by Githook User [ 02/Aug/21 ] |
|
Author: {'name': 'Randolph Tan', 'email': 'randolph@10gen.com', 'username': 'renctan'}Message: |