-
Type: Bug
-
Resolution: Fixed
-
Priority: Critical - P2
-
Affects Version/s: 5.0.0, 5.1.0
-
Component/s: Sharding
-
Fully Compatible
-
ALL
-
v5.1, v5.0
-
Sharding 2021-11-29
-
159
-
2
PrimaryOnlyService::onStepUp() waits for stepUpOpTime to become majority-committed before attempting to rebuild any Instances. New optimes becoming majority-committed depend on the ability for secondaries to successfully read new entries from a forward-scanning oplog cursor, which in turn depend on there not being any outstanding storage transactions with oplog slots still reserved (aka a hole in the oplog).
ReshardingOpObserver::onUpdate() attempts to get the ReshardingCoordinator and its associated ReshardingCoordinatorObserver to update their in-memory states. Doing so must wait until the ReshardingCoordinatorService has finished rebuilding. However, ReshardingOpObserver::onUpdate() currently waits for the ReshardingCoordinatorService to have finished rebuilding with its storage transaction still active and after having acquired an oplog slot for the update to config.reshardingOperations. If the ReshardingCoordinatorService wasn't already rebuilt before the update to the config.reshardingOperations collection came in from the donor or recipient shard, then it won't ever finish rebuilding. Nor will replication on the config server be able to make progress with the oplog hole present.
- is caused by
-
SERVER-49572 Implement onReshardingParticipantTransition in the ReshardingCoordinatorObserver
- Closed
- related to
-
SERVER-61483 Resharding coordinator fails to recover abort decision on step-up, attempts to commit operation as success, leading to data inconsistency
- Closed
-
SERVER-61607 Accept DuplicateKey as a possible error in resharding_nonblocking_coordinator_rebuild.js
- Closed