-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Catalog and Routing
-
200
-
🟥 DDL
-
None
-
None
-
None
-
None
-
None
-
None
Investigate possible solutions to prevent range deletion tasks from executing with stale metadata during collection rename operations.
A quick fix has been implemented in https://jira.mongodb.org/browse/SERVER-113667 to address https://jira.mongodb.org/browse/BF-40309, but further investigation is needed to determine if there is a better approach to synchronize range deletion task execution with collection rename operations.
There is the issue due to a race condition when a RangeDeletionTask starts during the collection renaming process.
RangeDeleterServiceOpObserver::onUpdate reads already updated RangeDeletionTask, but the metadataTracker collection UUID doesn’t match the RangeDeletionTask UUID in metadata_manager.cpp. {{metadataTracker->metadata->getUUID() }}has still the old id, as it was not cleaned up yet.
Here’s the timeline:
1. Rename task begins
Collection `random_ddl_operations_DB_1.sharded_coll_0 e49b571d-a994-4c1c-8413-f379794fde41` renamed to `random_ddl_operations_DB_1.sharded_coll_1 376959a7-dd3c-445a-a9b1-c8763c1c1fa0`
2. It passes all steps snapshotRangeDeletionsForRename, restoreRangeDeletionTasksForRename, deleteRangeDeletionTasksForRename
I assume at this point the updated RangeDeletionTask is available in the storage.
3. the next task is to releaseRecoverableCriticalSection and clean up metadata FilteringMetadataClearer, but there is a delay about 300 milliseconds where range deleter starts:
[j0:s1:prim] {"t":{"$date":"2025-11-07T17:33:46.371+00:00"},"s":"I", "c":"-", "id":0, "svc":"S", "ctx":"range-deleter","msg":"aaaaaa71 uuid random_ddl_operations_DB_1.sharded_coll_1 376959a7-dd3c-445a-a9b1-c8763c1c1fa0 e49b571d-a994-4c1c-8413-f379794fde41 1|1||690e2d7897786e48dc9e644c||Timestamp(1762536824, 88) 1|0||690e2d7897786e48dc9e6444||Timestamp(1762536824, 20)"}
[j0:s1:prim] {"t":{"$date":"2025-11-07T17:33:46.371+00:00"},"s":"I", "c":"-", "id":0, "svc":"S", "ctx":"range-deleter","msg":"invalidateRangePreservers { _id: UUID(\"eef04cf6-c551-4bb0-a7da-93fa30d07a74\"), nss: \"random_ddl_operations_DB_1.sharded_coll_1\", collectionUuid: UUID(\"e49b571d-a994-4c1c-8413-f379794fde41\"), donorShardId: \"shard-rs1\", range: { min: { _id: MinKey }, max: { _id: MaxKey } }, processing: true, whenToClean: \"now\", timestamp: Timestamp(1762536824, 70), numOrphanDocs: 0, keyPattern: { _id: 1.0 }, preMigrationShardVersion: { e: ObjectId('690e2d7897786e48dc9e6444'), t: Timestamp(1762536824, 20), v: Timestamp(1, 0) } }"}
It reads already updated RangeDeletionTask random_ddl_operations_DB_1.sharded_coll_1 which has source uuid(from random_ddl_operations_DB_1.sharded_coll_0), but metadataTracker->metadata->getUUID() has still the old id, as it was not cleaned up yet.
4. The releaseRecoverableCriticalSection and clean up metadata FilteringMetadataClearer step starts.
[j0:s1:prim] {"t":{"$date":"2025-11-07T17:33:46.618+00:00"},"s":"I", "c":"-", "id":0, "svc":"S", "ctx":"RenameCollectionParticipantService-0","msg":"Cleanup starts random_ddl_operations_DB_1.sharded_coll_0 e49b571d-a994-4c1c-8413-f379794fde41 to random_ddl_operations_DB_1.sharded_coll_1 376959a7-dd3c-445a-a9b1-c8763c1c1fa0"}
[j0:s1:prim] {"t":{"$date":"2025-11-07T17:33:46.651+00:00"},"s":"I", "c":"-", "id":0, "svc":"S", "ctx":"RenameCollectionParticipantService-0","msg":"Cleanup finished random_ddl_operations_DB_1.sharded_coll_0 e49b571d-a994-4c1c-8413-f379794fde41 to random_ddl_operations_DB_1.sharded_coll_1 376959a7-dd3c-445a-a9b1-c8763c1c1fa0"}
- is related to
-
SERVER-113667 Skip RangeDeleter invalidation when the UUID of the CSR mismatch the UUID of the RangeDeletionTask
-
- Closed
-