-
Type: Bug
-
Resolution: Won't Do
-
Priority: Major - P3
-
None
-
Affects Version/s: 3.4.0, 3.6.0, 4.0.0, 4.2.0
-
Component/s: Sharding
-
None
-
Sharding EMEA
-
ALL
-
The changes from cc8e8a1 as part of SERVER-26307 made it so a BalancerInterrupted error response is no longer returned when the moveChunk command fails due to a retryable error on the replica set shard primary. Additionally, the changes from 53efde3 as part of SERVER-25999 made it so an OperationFailed error status would be returned by MigrationManager::_processRemoteCommandResponse(); however, any non-BalancerInterrupted error status is converted to an ok=1 response so long as the chunk has successfully been migrated. It does not check if _waitForDelete=true had been specified in the moveChunk command request to realize that we may not have waited long enough for the range to be cleaned up.
We should either (a) wait long enough, or (b) preserve the OperationFailed error response as a way to inform the user.
Status commandStatus = _processRemoteCommandResponse( remoteCommandResponse, &statusWithScopedMigrationRequest.getValue()); // Migration calls can be interrupted after the metadata is committed but before the command // finishes the waitForDelete stage. Any failovers, therefore, must always cause the moveChunk // command to be retried so as to assure that the waitForDelete promise of a successful command // has been fulfilled. >if (chunk->getShardId() == migrateInfo.to && commandStatus != ErrorCodes::BalancerInterrupted) { return Status::OK(); }
- is caused by
-
SERVER-26307 MigrationManager can keep a migration document when not in stepdown / shutdown because it can't differentiate between its own error codes and those of the shard with which it communicates
- Closed
- is related to
-
SERVER-25999 Mongos applies errors received from config server as config server errors, rather than a shard the config server calls and returns the error from
- Closed
-
SERVER-42192 Write a concurrency workload to test that orphaned ranges are always deleted and nothing that shouldn’t be deleted gets deleted
- Closed
- related to
-
SERVER-53094 Tests which use {waitForDelete:true} on moveChunk are not safe to run in the sharding_csrs_continuous_config_stepdown suite
- Closed
-
SERVER-66716 WaitForDelete may not be honored in case of retry
- Closed
-
SERVER-64181 Remove TODO listed in SERVER-46669
- Closed