Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-48198

Migration recovery may recover incorrect decision after shard key refine

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.5.1, 4.4.0-rc7
    • Component/s: Sharding
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v4.4
    • Sprint:
      Sharding 2020-05-18
    • Linked BF Score:
      28

      Description

      When a new primary steps up in a shardsvr replica set, it launches a task to recover any migrations driven by the node's shard that were in-progress when the previous primary stepped down. As part of this, the recovery process will recover the outcome of each migration by loading the latest metadata from the config server and checking if the minimum bound from the migration still belongs to the donor shard. If it does, the migration is assumed to have aborted, and the recovery process updates the persisted range deleter state on the donor and recipient shards so any orphans on either are deleted.

      If an interrupted migration committed successfully and its namespace had its shard key refined before the recovery process runs, the check for ownership will use the pre-refine minimum boundary but a post-refine routing table. This may result in a spurious overlap, which leads the recovery process to incorrectly decide the migration aborted, preventing any orphans on the donor from being cleaned up. The recipient will attempt to schedule a range deletion for the received range, which will fail with RangeOverlapConflict.

      To fix this, the recovery process should extend the migration's min bound when performing the ownership check if the most recent shard key has more fields, like what was done inĀ SERVER-46386.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              jack.mulrow Jack Mulrow
              Reporter:
              jack.mulrow Jack Mulrow
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: