Resharding operations in quiesced state should not be aborted by restore

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Fixed
    • Priority: Major - P3
    • 8.2.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • None
    • Replication
    • Fully Compatible
    • ALL
    • v8.0
    • Repl 2025-04-14
    • 200
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None

      SPM-2322 introduced a quiesced state to resharding operations that follow the resharding committing and aborting states. It allows metadata documents to persist on a node for a default of 15 minutes, allowing users to query the database for results of resharding operations. This only happens when the user passes in a reshardingUUID parameter to the command.

      In magic restore (and the restore procedure on cloud), we attempt to abort any in-progress resharding operations. We do this by searching the config.reshardingOperations collection to find any document with a state != committing, and we set the state to aborting. However, this predicate includes documents in the quiesced state. Since this state occurs after a resharding operation is committed or aborted, it isn't logically correct to set this state to aborting. In the case of a successful resharding operation, the data has already been resharded.

      Note that when a node gets an explicit abort command for a resharding operation in the quiesced state, the node just ends the quiesce state early. The resharding operation still succeeded.

      We should modify the restore check here to check for state not in ["committing", "aborting", "quiesced"]. We should add a test case for quiesced and aborted metadata documents to the existing restore resharding tests.

      Although this is rare, the impact is a user that is querying a quiesced resharding coordinator document will see an aborted operation, when in fact the data has already been resharded.

            Assignee:
            Ali Mir
            Reporter:
            Ali Mir
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: