Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-62332

RefineCollectionShardKeyCoordinator doesn't disallow migrations while it's executing

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 5.3.0
    • Affects Version/s: 5.2.0, 5.1.0
    • Component/s: Sharding
    • Fully Compatible
    • ALL
    • Sharding EMEA 2022-01-10, Sharding EMEA 2022-01-24
    • 5

      The operation of refining a collection's shard key changes the epoch of a collection and because of this it is not safe to run concurrently with move/split/merge. The current way that this synchronisation is happening is through the dist lock.

      Prior to the introduction of the BalancerCommandsScheduler utility, the dist locks held by the balancer (on behalf of moveChunk) were being recovered when the config server was in drain mode. This meant that on step-down/step-up, it wasn't possible that another operation could sneak-in and take the dist lock from underneath chunk migration. However, now this recovery is happening outside of drain mode and because of this we can have refine and moveChunk running concurrently.

      Instead of fixing the BalancerCommandsScheduler's recovery, we should move forward in the direction of getting rid of the dist lock and should just make the RefineCollectionShardKeyCoordinator disallow migrations.

            jordi.serra-torrens@mongodb.com Jordi Serra Torrens
            kaloian.manassiev@mongodb.com Kaloian Manassiev
            0 Vote for this issue
            3 Start watching this issue