Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 5.2.0, 5.0.4, 5.1.0-rc2
Affects Version/s: 5.0.0, 5.1.0-rc0
Component/s: Sharding
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v5.1, v5.0
Sprint:
Sharding 2021-11-01
Story Points:
1
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

The primary shard for the database running the _shardsvrReshardCollection command will re-send the _configsvrReshardCollection to the config server primary following a retryable error. The new invocation of the _configsvrReshardCollection command will join the existing ReshardingCoordinator instance rather than constructing a new one. However, when this situation occurs, setAlwaysInterruptAtStepDownOrUp() won't have been called on the OperationContext for the _configsvrReshardCollection command. The coordinator document having been written future and the resharding operation completion future aren't guaranteed to become ready with an error on stepdown or shutdown. This leads the _configsvrReshardCollection command to continue running on the config server node after it has stepped down.

We should call setAlwaysInterruptAtStepDownOrUp() before waiting on these futures so that if the config server primary steps down then the primary shard for the database running the _shardsvrReshardCollection command will re-send the _configsvrReshardCollection to the new config server primary.

if (auto existingInstance =
        getExistingInstanceToJoin(opCtx, nss, request().getKey())) {
    // Join the existing resharding operation to prevent generating a new resharding
    // instance if the same command is issued consecutively due to client disconnect
    // etc.
    reshardCollectionJoinedExistingOperation.pauseWhileSet(opCtx);
    existingInstance.get()->getCoordinatorDocWrittenFuture().get(opCtx);
    return existingInstance;
}

is depended on by

SERVER-57686 We need test coverage that runs resharding in the face of elections

Closed

Assignee:: Max Hirschhorn
Reporter:: Max Hirschhorn
Participants:: Githook User, Max Hirschhorn
Votes:: 0 Vote for this issue
Watchers:: 2 Start watching this issue

Created:: Oct 20 2021 03:44:31 PM UTC
Updated:: Oct 29 2023 09:47:08 PM UTC
Resolved:: Oct 21 2021 10:24:11 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates

PagerDuty