Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-55557

Range deletion of aborted migration can fail after a refine shard key

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.1.0
    • Component/s: Sharding
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Sprint:
      Sharding EMEA 2021-05-31
    • Linked BF Score:
      27

      Description

      At the end of _configSvrRefineCollectionShardKey it triggers a best-effort fire-and-forget refresh to the shards that own chunks. It's best effort, so it is not guaranteed that the shards will actually refresh.

      Consider a shard that had cached metadata for the collection, but had not successfully refreshed after the refineCollectionShardKey. If this shard is later a recipient of a chunk migration that gets aborted, when this shard goes to execute the range deletion, it will believe the collection still has the old shard key. However, the range boundaries in the task are with the new refined shard key. So this call to KeyPattern::extendRangeBound will fail here

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              jordi.serra-torrens Jordi Serra Torrens
              Reporter:
              jordi.serra-torrens Jordi Serra Torrens
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: