Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-65371

MigrationSourceManager running on secondary node may trip invariant

    • Fully Compatible
    • ALL
    • v6.0, v5.3, v5.0
    • Hide

      0001-Repro-BF-24832.patch

      ./buildscripts/resmoke.py run --storageEngine=wiredTiger --storageEngineCacheSizeGB=.50 --suite=sharding jstests/sharding/bf-24832-repro.js  --log=file
      
      Show
      0001-Repro-BF-24832.patch ./buildscripts/resmoke.py run --storageEngine=wiredTiger --storageEngineCacheSizeGB=.50 --suite=sharding jstests/sharding/bf-24832-repro.js --log=file
    • Sharding EMEA 2022-05-02, Sharding EMEA 2022-05-16, Sharding EMEA 2022-05-30, Sharding EMEA 2022-06-13
    • 48

      The shardsvr's 'moveChunk' is allowed on primary nodes only. However this check is just a best effort – the member state could change anytime later and the command will continue.
      The command body does take some precautions to ensure a stable member state: It briefly takes the GlobalLock in mode IX to:
      (1) Flag that opCtx as should be killed on stepdown
      (2) Synchronize with the thread that kills opCtxs on stepdown
      This ensures that the MigrationSourceManager will will run on a single term (see BF-24411). However, it doesn't ensure that this node is primary. For instance, the following interleaving could happen:
      1. The node is primary when this is evaluated
      2. The node becomes secondary here
      3. Here the opCtx will get flagged as killable on stepdown, but the node has already stepped down, so it won't be interrupted!

      In this scenario the command will continue executing and will instantiate a MigrationSourceManager:
      4. The MSM will check that there are no migrations pending recovery. Assume that there are none at this point.
      5. Now the new primary starts a migration, inserts its recovery document and the old primary replicates it.
      6. Now the old primary evaluates this invariant, find the document inserted on (5) and crashes.

        1. 0001-Repro-BF-24832.patch
          6 kB
          Jordi Serra Torrens
        2. 0001-SERVER-65371-Ensure-MigraitonSourceManager-is-only-i.patch
          2 kB
          Jordi Serra Torrens

            Assignee:
            paolo.polato@mongodb.com Paolo Polato
            Reporter:
            jordi.serra-torrens@mongodb.com Jordi Serra Torrens
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: