Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-62332

RefineCollectionShardKeyCoordinator doesn't disallow migrations while it's executing

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Major - P3
    • Resolution: Fixed
    • 5.2.0, 5.1.0
    • 5.3.0
    • Sharding
    • Fully Compatible
    • ALL
    • Sharding EMEA 2022-01-10, Sharding EMEA 2022-01-24
    • 5

    Description

      The operation of refining a collection's shard key changes the epoch of a collection and because of this it is not safe to run concurrently with move/split/merge. The current way that this synchronisation is happening is through the dist lock.

      Prior to the introduction of the BalancerCommandsScheduler utility, the dist locks held by the balancer (on behalf of moveChunk) were being recovered when the config server was in drain mode. This meant that on step-down/step-up, it wasn't possible that another operation could sneak-in and take the dist lock from underneath chunk migration. However, now this recovery is happening outside of drain mode and because of this we can have refine and moveChunk running concurrently.

      Instead of fixing the BalancerCommandsScheduler's recovery, we should move forward in the direction of getting rid of the dist lock and should just make the RefineCollectionShardKeyCoordinator disallow migrations.

      Attachments

        Issue Links

          Activity

            People

              jordi.serra-torrens@mongodb.com Jordi Serra Torrens
              kaloian.manassiev@mongodb.com Kaloian Manassiev
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: