Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-65371

MigrationSourceManager running on secondary node may trip invariant

    XMLWordPrintable

Details

    • Fully Compatible
    • ALL
    • v6.0, v5.3, v5.0
    • Hide

      0001-Repro-BF-24832.patch

      ./buildscripts/resmoke.py run --storageEngine=wiredTiger --storageEngineCacheSizeGB=.50 --suite=sharding jstests/sharding/bf-24832-repro.js  --log=file
      

      Show
      0001-Repro-BF-24832.patch ./buildscripts/resmoke.py run --storageEngine=wiredTiger --storageEngineCacheSizeGB=.50 --suite=sharding jstests/sharding/bf-24832-repro.js --log=file
    • Sharding EMEA 2022-05-02, Sharding EMEA 2022-05-16, Sharding EMEA 2022-05-30, Sharding EMEA 2022-06-13
    • 48

    Description

      The shardsvr's 'moveChunk' is allowed on primary nodes only. However this check is just a best effort – the member state could change anytime later and the command will continue.
      The command body does take some precautions to ensure a stable member state: It briefly takes the GlobalLock in mode IX to:
      (1) Flag that opCtx as should be killed on stepdown
      (2) Synchronize with the thread that kills opCtxs on stepdown
      This ensures that the MigrationSourceManager will will run on a single term (see BF-24411). However, it doesn't ensure that this node is primary. For instance, the following interleaving could happen:
      1. The node is primary when this is evaluated
      2. The node becomes secondary here
      3. Here the opCtx will get flagged as killable on stepdown, but the node has already stepped down, so it won't be interrupted!

      In this scenario the command will continue executing and will instantiate a MigrationSourceManager:
      4. The MSM will check that there are no migrations pending recovery. Assume that there are none at this point.
      5. Now the new primary starts a migration, inserts its recovery document and the old primary replicates it.
      6. Now the old primary evaluates this invariant, find the document inserted on (5) and crashes.

      Attachments

        Issue Links

          Activity

            People

              paolo.polato@mongodb.com Paolo Polato
              jordi.serra-torrens@mongodb.com Jordi Serra Torrens
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: