Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-50890

Failure to persist migration coordinator document leads to hung migration

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.9.0
    • Component/s: Sharding
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Sprint:
      Sharding 2020-10-19, Sharding 2020-11-02, Sharding 2020-11-16, Sharding 2020-11-30, Sharding 2020-12-14, Sharding 2020-12-28, Sharding 2021-01-11, Sharding 2021-01-25
    • Linked BF Score:
      24

      Description

      When starting the cloning phase of a migration, the donor shard will insert a document into the config.migrationCoordinators collection. If this fails and throws an exception, it will trigger the MigrationSourceManager::cleanupOnError() scope guard, which will try to complete the migration by persisting an abort decision through an update to the document that failed to be inserted, which will fail because there is no matching document. Persisting the decision retries on errors until a stepdown or shutdown, so until that happens, the migration will hang trying to update the non-existent document.

      UPDATE fixing this issue exposed another problem, if a migration coordinator document is left without decision (like for example, because the document insert failed to honor the majority write concern) then another migration on the same session that increases the transaction number would cause the bump of the txnNumber to fail during the next recovery with a TransactionTooOld error, as can be seen on the linked BF.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              marcos.grillo Marcos José Grillo Ramirez
              Reporter:
              jack.mulrow Jack Mulrow
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: