Range deletion tasks may reference stale collection metadata during rename operation

    • Type: Bug
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Catalog and Routing
    • 200
    • 🟥 DDL
    • None
    • None
    • None
    • None
    • None
    • None

      Investigate possible solutions to prevent range deletion tasks from executing with stale metadata during collection rename operations.  

      A quick fix has been implemented in https://jira.mongodb.org/browse/SERVER-113667 to address https://jira.mongodb.org/browse/BF-40309, but further investigation is needed to determine if there is a better approach to synchronize range deletion task execution with collection rename operations. 

      There is the issue due to a race condition when a RangeDeletionTask starts during the collection renaming process. 
      RangeDeleterServiceOpObserver::onUpdate reads already updated RangeDeletionTask, but the metadataTracker collection UUID doesn’t match the RangeDeletionTask UUID in metadata_manager.cpp. {{metadataTracker->metadata->getUUID() }}has still the old id, as it was not cleaned up yet.

      Here’s the timeline:

      1. Rename task begins
      Collection `random_ddl_operations_DB_1.sharded_coll_0 e49b571d-a994-4c1c-8413-f379794fde41` renamed to `random_ddl_operations_DB_1.sharded_coll_1 376959a7-dd3c-445a-a9b1-c8763c1c1fa0`
      2. It passes all steps snapshotRangeDeletionsForRename, restoreRangeDeletionTasksForRename, deleteRangeDeletionTasksForRename
      I assume at this point the updated RangeDeletionTask is available in the storage.
      3. the next task is to releaseRecoverableCriticalSection and clean up metadata FilteringMetadataClearer, but there is a delay about 300 milliseconds where range deleter starts:

      [j0:s1:prim] {"t":{"$date":"2025-11-07T17:33:46.371+00:00"},"s":"I", "c":"-", "id":0, "svc":"S", "ctx":"range-deleter","msg":"aaaaaa71 uuid random_ddl_operations_DB_1.sharded_coll_1 376959a7-dd3c-445a-a9b1-c8763c1c1fa0 e49b571d-a994-4c1c-8413-f379794fde41 1|1||690e2d7897786e48dc9e644c||Timestamp(1762536824, 88) 1|0||690e2d7897786e48dc9e6444||Timestamp(1762536824, 20)"} 
      [j0:s1:prim] {"t":{"$date":"2025-11-07T17:33:46.371+00:00"},"s":"I", "c":"-", "id":0, "svc":"S", "ctx":"range-deleter","msg":"invalidateRangePreservers { _id: UUID(\"eef04cf6-c551-4bb0-a7da-93fa30d07a74\"), nss: \"random_ddl_operations_DB_1.sharded_coll_1\", collectionUuid: UUID(\"e49b571d-a994-4c1c-8413-f379794fde41\"), donorShardId: \"shard-rs1\", range: { min: { _id: MinKey }, max: { _id: MaxKey } }, processing: true, whenToClean: \"now\", timestamp: Timestamp(1762536824, 70), numOrphanDocs: 0, keyPattern: { _id: 1.0 }, preMigrationShardVersion: { e: ObjectId('690e2d7897786e48dc9e6444'), t: Timestamp(1762536824, 20), v: Timestamp(1, 0) } }"}  

      It reads already updated RangeDeletionTask random_ddl_operations_DB_1.sharded_coll_1 which has source uuid(from random_ddl_operations_DB_1.sharded_coll_0), but metadataTracker->metadata->getUUID() has still the old id, as it was not cleaned up yet.

      4. The releaseRecoverableCriticalSection and clean up metadata FilteringMetadataClearer step starts.

      [j0:s1:prim] {"t":{"$date":"2025-11-07T17:33:46.618+00:00"},"s":"I", "c":"-", "id":0, "svc":"S", "ctx":"RenameCollectionParticipantService-0","msg":"Cleanup starts random_ddl_operations_DB_1.sharded_coll_0 e49b571d-a994-4c1c-8413-f379794fde41 to random_ddl_operations_DB_1.sharded_coll_1 376959a7-dd3c-445a-a9b1-c8763c1c1fa0"} 
      [j0:s1:prim] {"t":{"$date":"2025-11-07T17:33:46.651+00:00"},"s":"I", "c":"-", "id":0, "svc":"S", "ctx":"RenameCollectionParticipantService-0","msg":"Cleanup finished random_ddl_operations_DB_1.sharded_coll_0 e49b571d-a994-4c1c-8413-f379794fde41 to random_ddl_operations_DB_1.sharded_coll_1 376959a7-dd3c-445a-a9b1-c8763c1c1fa0"}

            Assignee:
            Unassigned
            Reporter:
            Igor Praznik
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: