Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-38344

Early release of distributed database locks during initial collection sharding results in migration/split failures

    • Fully Compatible
    • ALL
    • v3.4
    • Sharding 2019-01-14, Sharding 2019-02-25

      This is for 3.6, with backport requested to 3.4.

      When sharding multiple collections simultaneously, "[t]he shardCollection distributed locks are only held until the initial "large" chunks are created and they are dropped just before we run the migrateAndFurtherSplitInitialChunks stage, which is what spreads them evenly across the cluster. Because the dist lock is dropped, the second collection's sharding begins and then we start having migrations that run in parallel on the same shard (the primary shard) and this is not allowed, so one of the initial splits would fail." This can result in initial distributions of chunks that are not evenly distributed.

      The initially proposed solution is to "[release] the collection dist lock before the moves, but leave the two database locks held until after the migrates and splits have completed. This will work because split and migrate only takes collection dist lock."

      This is not an issue in 4.0.4.

            blake.oler@mongodb.com Blake Oler
            eric.sedor@mongodb.com Eric Sedor
            0 Vote for this issue
            7 Start watching this issue