Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-38344

Early release of distributed database locks during initial collection sharding results in migration/split failures

    XMLWordPrintable

    Details

    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v3.4
    • Sprint:
      Sharding 2019-01-14, Sharding 2019-02-25

      Description

      This is for 3.6, with backport requested to 3.4.

      When sharding multiple collections simultaneously, "[t]he shardCollection distributed locks are only held until the initial "large" chunks are created and they are dropped just before we run the migrateAndFurtherSplitInitialChunks stage, which is what spreads them evenly across the cluster. Because the dist lock is dropped, the second collection's sharding begins and then we start having migrations that run in parallel on the same shard (the primary shard) and this is not allowed, so one of the initial splits would fail." This can result in initial distributions of chunks that are not evenly distributed.

      The initially proposed solution is to "[release] the collection dist lock before the moves, but leave the two database locks held until after the migrates and splits have completed. This will work because split and migrate only takes collection dist lock."

      This is not an issue in 4.0.4.

        Attachments

          Activity

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: