Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-61483

Resharding coordinator fails to recover abort decision on step-up, attempts to commit operation as success, leading to data inconsistency

    • Fully Compatible
    • ALL
    • v5.1, v5.0
    • Sharding 2021-11-29
    • 2

      The ReshardingCoordinator relies on an exception being thrown and its .onError() handler being called to trigger its _shardsvrAbortReshardCollection flow. However, the ReshardingCoordinator fails to read the current state of the coordinator document to trigger the _shardsvrAbortReshardCollection flow when an earlier config server primary had already decided the resharding operation must abort. The lack of the .onError() handler being called leads the ReshardingCoordinator to attempt to commit the resharding operation anyway. This is severely problematic because the resulting collection will be incomplete and inconsistent (i.e. lost writes).

      • Shards which had already received the _shardsvrAbortReshardCollection command from the earlier config server primary's resharding coordinator may have dropped the temporary resharding collection already. These shards effectively ignore the _shardsvrCommitReshardCollection command.
      • Other shards which erroneously receive the _shardsvrCommitReshardCollection command will rename the temporary resharding collection over the source collection.
        • Even shards which voted to abort to abort resharding operation (e.g. unrecoverable error during collection cloning or oplog application) can still rename the temporary resharding collection over the source collection.
        • However shards which aren't in the "strict-consistency" state (recipient role) and aren't in the "blocking-writes" state (donor role) will reject the _shardsvrCommitReshardCollection command. The ReshardCollectionInProgress error response returned to the resharding coordinator will lead the config server primary to fassert(). While the fassert(5277000) is an indicator of this issue occurring, it isn't guaranteed that any shards will still be in a state to detect the resharding coordinator having delivered different decisions to different shards.
      {"t":{"$date":"2021-11-14T16:37:49.291+00:00"},"s":"E",  "c":"ASSERT",   "id":4457000, "ctx":"conn84","msg":"Tripwire assertion","attr":{"error":{"code":338,"codeName":"ReshardCollectionInProgress","errmsg":"Attempted to commit the resharding operation in an incorrect state"},"location":"{fileName:\"src/mongo/db/s/resharding/resharding_recipient_service.cpp\", line:918, functionName:\"operator()\"}"}}
      {"t":{"$date":"2021-11-14T16:38:00.557+00:00"},"s":"F",  "c":"RESHARD",  "id":5277000, "ctx":"ReshardingCoordinatorService-1","msg":"Unrecoverable error past the point resharding was guaranteed to succeed","attr":{"error":"ReshardCollectionInProgress: Failed command { _shardsvrCommitReshardCollection: \"reshardingDb.coll\", reshardingUUID: UUID(\"4755e8fb-35ab-4306-b832-c3a81b44b8d1\"), writeConcern: { w: \"majority\" }, $audit: { $impersonatedUsers: [ { user: \"__system\", db: \"local\" } ], $impersonatedRoles: [] } } for database 'admin' on shard 'shard1-recipient0' :: caused by :: Attempted to commit the resharding operation in an incorrect state"}}

      Thank you to chuck.zhang for discovering this issue while working on the automation restore procedure (which has the config server being started up in the aborting state for the resharding operation).

            max.hirschhorn@mongodb.com Max Hirschhorn
            max.hirschhorn@mongodb.com Max Hirschhorn
            0 Vote for this issue
            7 Start watching this issue