Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-46370

Correctly maintain receiving chunks list after shard key refine

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.4.0-rc0, 4.5.1
    • Component/s: Sharding
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v4.4
    • Sprint:
      Sharding 2020-03-09, Sharding 2020-03-23, Sharding 2020-04-06

      Description

      For the non-resumable range deleter protocol, shards track chunks currently being received in a migration in an in-memory map, removing the range when the migration succeeds or fails. There are at least two places where a refine during a migration (which can only happen if a migration runs without the distributed lock) can lead a range to incorrectly remain in this list after the migration aborts:

      1. When setting new filtering metadata in the MetadataManager, we clear entries from the receiving chunks list that overlap with the new metadata. This comparison goes through the ChunkManager, which uses key strings to compare ranges, which doesn't work correctly for ranges with different numbers of fields, like after a refine.
      2. In MigrationDestinationManager::_forgetReceive(), we don't remove a chunk from the receiving range list if the epoch has changed. If there's a refine during the migration, this won't be correct and we may fail to clear the received range from receiving chunks.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              matthew.saltz Matthew Saltz
              Reporter:
              jack.mulrow Jack Mulrow
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: