-
Type:
Bug
-
Resolution: Fixed
-
Priority:
Major - P3
-
Affects Version/s: None
-
Component/s: None
-
None
-
Replication
-
Fully Compatible
-
ALL
-
v8.0
-
Repl 2025-04-14
-
200
-
None
-
3
-
None
-
None
-
None
-
None
-
None
-
None
SPM-2322 introduced a quiesced state to resharding operations that follow the resharding committing and aborting states. It allows metadata documents to persist on a node for a default of 15 minutes, allowing users to query the database for results of resharding operations. This only happens when the user passes in a reshardingUUID parameter to the command.
In magic restore (and the restore procedure on cloud), we attempt to abort any in-progress resharding operations. We do this by searching the config.reshardingOperations collection to find any document with a state != committing, and we set the state to aborting. However, this predicate includes documents in the quiesced state. Since this state occurs after a resharding operation is committed or aborted, it isn't logically correct to set this state to aborting. In the case of a successful resharding operation, the data has already been resharded.
Note that when a node gets an explicit abort command for a resharding operation in the quiesced state, the node just ends the quiesce state early. The resharding operation still succeeded.
We should modify the restore check here to check for state not in ["committing", "aborting", "quiesced"]. We should add a test case for quiesced and aborted metadata documents to the existing restore resharding tests.
Although this is rare, the impact is a user that is querying a quiesced resharding coordinator document will see an aborted operation, when in fact the data has already been resharded.
- is caused by
-
SERVER-100421 Resharding failure leads to all values inserted as zeroes in atlas log ingestion
-
- Closed
-