-
Type:
Bug
-
Resolution: Fixed
-
Priority:
Major - P3
-
Affects Version/s: None
-
Component/s: None
-
None
-
Catalog and Routing
-
Fully Compatible
-
ALL
-
CAR Team 2025-11-24, CAR Team 2025-12-08
-
200
-
None
-
None
-
None
-
None
-
None
-
None
-
None
There is the issue due to a race condition when a RangeDeletionTask starts during the collection renaming process.
RangeDeleterServiceOpObserver::onUpdate reads already updated RangeDeletionTask, but the metadataTracker collection UUID doesn’t match the RangeDeletionTask UUID in metadata_manager.cpp. {{metadataTracker->metadata->getUUID() }}has still the old id, as it was not cleaned up yet.
Here’s the timeline:
1. Rename task begins
Collection `random_ddl_operations_DB_1.sharded_coll_0 e49b571d-a994-4c1c-8413-f379794fde41` renamed to `random_ddl_operations_DB_1.sharded_coll_1 376959a7-dd3c-445a-a9b1-c8763c1c1fa0`
2. It passes all steps snapshotRangeDeletionsForRename, restoreRangeDeletionTasksForRename, deleteRangeDeletionTasksForRename
I assume at this point the updated RangeDeletionTask is available in the storage.
3. the next task is to releaseRecoverableCriticalSection and clean up metadata FilteringMetadataClearer, but there is a delay about 300 milliseconds where range deleter starts:
[j0:s1:prim] {"t":{"$date":"2025-11-07T17:33:46.371+00:00"},"s":"I", "c":"-", "id":0, "svc":"S", "ctx":"range-deleter","msg":"aaaaaa71 uuid random_ddl_operations_DB_1.sharded_coll_1 376959a7-dd3c-445a-a9b1-c8763c1c1fa0 e49b571d-a994-4c1c-8413-f379794fde41 1|1||690e2d7897786e48dc9e644c||Timestamp(1762536824, 88) 1|0||690e2d7897786e48dc9e6444||Timestamp(1762536824, 20)"}
[j0:s1:prim] {"t":{"$date":"2025-11-07T17:33:46.371+00:00"},"s":"I", "c":"-", "id":0, "svc":"S", "ctx":"range-deleter","msg":"invalidateRangePreservers { _id: UUID(\"eef04cf6-c551-4bb0-a7da-93fa30d07a74\"), nss: \"random_ddl_operations_DB_1.sharded_coll_1\", collectionUuid: UUID(\"e49b571d-a994-4c1c-8413-f379794fde41\"), donorShardId: \"shard-rs1\", range: { min: { _id: MinKey }, max: { _id: MaxKey } }, processing: true, whenToClean: \"now\", timestamp: Timestamp(1762536824, 70), numOrphanDocs: 0, keyPattern: { _id: 1.0 }, preMigrationShardVersion: { e: ObjectId('690e2d7897786e48dc9e6444'), t: Timestamp(1762536824, 20), v: Timestamp(1, 0) } }"}
It reads already updated RangeDeletionTask random_ddl_operations_DB_1.sharded_coll_1 which has source uuid(from random_ddl_operations_DB_1.sharded_coll_0), but metadataTracker->metadata->getUUID() has still the old id, as it was not cleaned up yet.
4. The releaseRecoverableCriticalSection and clean up metadata FilteringMetadataClearer step starts.
[j0:s1:prim] {"t":{"$date":"2025-11-07T17:33:46.618+00:00"},"s":"I", "c":"-", "id":0, "svc":"S", "ctx":"RenameCollectionParticipantService-0","msg":"Cleanup starts random_ddl_operations_DB_1.sharded_coll_0 e49b571d-a994-4c1c-8413-f379794fde41 to random_ddl_operations_DB_1.sharded_coll_1 376959a7-dd3c-445a-a9b1-c8763c1c1fa0"}
[j0:s1:prim] {"t":{"$date":"2025-11-07T17:33:46.651+00:00"},"s":"I", "c":"-", "id":0, "svc":"S", "ctx":"RenameCollectionParticipantService-0","msg":"Cleanup finished random_ddl_operations_DB_1.sharded_coll_0 e49b571d-a994-4c1c-8413-f379794fde41 to random_ddl_operations_DB_1.sharded_coll_1 376959a7-dd3c-445a-a9b1-c8763c1c1fa0"}
- is caused by
-
SERVER-96322 Treat shard version (epoch, timestamp, {0,0}) as a non-comparable version
-
- Closed
-
- related to
-
SERVER-114326 Range deletion tasks may reference stale collection metadata during rename operation
-
- Needs Scheduling
-