Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-101237

Investigate whether the metadata persisted by MigrationDestinationManager on config.collection_critical_sections have to be patched as part of the automated restore procedure.

    • Type: Icon: Task Task
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Catalog and Routing
    • CAR Team 2025-03-17, CAR Team 2025-03-31, CAR Team 2025-04-14
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      The MigrationDestinationManager coordinates the recipient-side execution of a chunk migration: between its steps,  it acquires a critical section for the parent collection through the ShardingRecoveryService, including the migration session ID as part of the Critical Section reason.

      Observing that a MigrationSessionId is actually an encoding of the shard IDs participating in the chunk migration, we should verify whether the following sequence  of events could cause any unexpected failure (and if so, we should update the automated restore procedure to patch the content of config.collection_critical_sections):

      • A chunk migration starts; the MigrationDestinationManager acquires the critical section, inserting a document onto config.collection_critical_sections with references to the current donor/recipient shard IDs in its reason field value
      • the automated restore procedure acquires a snapshot of the cluster while the migration is inflight, including the content of the recipient's config.collection_critical_sections
      •  the snapshot is applied to a destination cluster with same topology but different shard ID values; as part of the restore procedure, persisted metadata are patched - but the document generated by the MigrationDestinationManager is not considered
      • After applying the backup, the destination cluster resumes normal operations; depending on the exact PIT where the snapshot was acquired, the inflight migration will be either committed or aborted; either way, the destination shard will then finalize the operation by releasing the previously acquired critical section. Q: will the release fail due to a reason mismatch?

            Assignee:
            paolo.polato@mongodb.com Paolo Polato
            Reporter:
            paolo.polato@mongodb.com Paolo Polato
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              None
              None
              None
              None