Loading...

XML

Word

Printable

JSON

Type: New Feature
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 5.0.4, 5.1.0-rc0
Affects Version/s: None
Component/s: Sharding
Labels:
- PM-234-M3
- PM-234-T-lifecycle

Backwards Compatibility:
Fully Compatible
Backport Requested:

v5.0
Sprint:
Sharding 2021-07-26, Sharding 2021-08-09
Linked BF Score:
120
Story Points:
2
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

In our current implemention for the resharding coordinator, when resharding is done, we first remove the on-disk coordinator document and then clean the in-memory state (i.e completing/stepping down the metrics). This can cause issues. Consider the case in the BF. There is a stepdown after the coordinator document has been deleted but before the in-memory state has been cleaned. Since the coordinator document has been deleted, this instance is removed from the _activeInstances map in PrimaryOnlyService by the PrimaryOnlyServiceOpObserver. After this config server primary (referred to as primary_1 from here) steps down, a new primary will stepup. Since the old document and instance was deleted, this new primary won't resume the same resharding operation and will wait for the next resharding operation. When primary_1 steps up again as a primary, it will still have the not cleaned in-memory state from the original resharding operation which will conflict with the in-memory state of any new resharding operation.

Assignee:: Randolph Tan
Reporter:: Kshitij Gupta (Inactive)
Participants:: Githook User, Kshitij Gupta, Randolph Tan, Vivian Ge
Votes:: 0 Vote for this issue
Watchers:: 2 Start watching this issue

Created:: Jul 15 2021 05:17:17 PM UTC
Updated:: Oct 29 2023 09:50:49 PM UTC
Resolved:: Aug 03 2021 02:20:53 PM UTC
Confidence Status Last Update:: 20/Jul/21 3:31 PM

Details

Description

Attachments

Forms

Activity

People

Dates