Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-52906

moveChunk after failed migration that rolled back cloning indexes can hang indefinitely due to missing shard key index

    XMLWordPrintable

    Details

    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v5.0, v4.4
    • Sprint:
      Sharding 2020-12-28, Sharding 2021-01-11, Sharding EMEA 2021-06-14
    • Linked BF Score:
      140

      Description

      A recipient can end up being unable to process a range deletion task if:

      1. The recipient persists a "pending" range deletion doc for a migration.
      2. The recipient fails over after cloning indexes, but before majority committing that index creation.
      3. On step up, the index creation gets rolled back.
      4. The donor marks the recipient's range deletion doc as no longer pending
      5. The recipient submits the range for deletion from the op observer
      6. The range deletion infinitely fails because the shard key index doesn't exist.

      Until the range deletion task is processed, the recipient will be unable to re-receive a chunk that overlaps the range deletion task's range.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              paolo.polato Paolo Polato
              Reporter:
              esha.maharishi Esha Maharishi
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: