Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Critical - P2
Fix Version/s: 5.2.0, 5.0.5, 5.1.1
Affects Version/s: 5.0.0, 5.1.0
Component/s: Sharding
Labels:
- sharding-nyc-subteam1

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v5.1, v5.0
Sprint:
Sharding 2021-11-29
Linked BF Score:
159
Story Points:
2
Confidence Status:
None
Work Order:
0
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

PrimaryOnlyService::onStepUp() waits for stepUpOpTime to become majority-committed before attempting to rebuild any Instances. New optimes becoming majority-committed depend on the ability for secondaries to successfully read new entries from a forward-scanning oplog cursor, which in turn depend on there not being any outstanding storage transactions with oplog slots still reserved (aka a hole in the oplog).

ReshardingOpObserver::onUpdate() attempts to get the ReshardingCoordinator and its associated ReshardingCoordinatorObserver to update their in-memory states. Doing so must wait until the ReshardingCoordinatorService has finished rebuilding. However, ReshardingOpObserver::onUpdate() currently waits for the ReshardingCoordinatorService to have finished rebuilding with its storage transaction still active and after having acquired an oplog slot for the update to config.reshardingOperations. If the ReshardingCoordinatorService wasn't already rebuilt before the update to the config.reshardingOperations collection came in from the donor or recipient shard, then it won't ever finish rebuilding. Nor will replication on the config server be able to make progress with the oplog hole present.

is caused by

SERVER-49572 Implement onReshardingParticipantTransition in the ReshardingCoordinatorObserver

Closed

related to

SERVER-61483 Resharding coordinator fails to recover abort decision on step-up, attempts to commit operation as success, leading to data inconsistency

Closed

SERVER-61607 Accept DuplicateKey as a possible error in resharding_nonblocking_coordinator_rebuild.js

Closed

Assignee:: Max Hirschhorn
Reporter:: Max Hirschhorn
Participants:: Githook User, Max Hirschhorn
Votes:: 0 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: Nov 15 2021 03:28:16 PM UTC
Updated:: Oct 29 2023 09:46:02 PM UTC
Resolved:: Nov 17 2021 12:39:40 PM UTC
Confidence Status Last Update:: 15/Nov/21 3:30 PM

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates