Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-47422

Use NamespaceSerializer when taking distributed locks for refineCollectionShardKey and migrations

    XMLWordPrintable

    Details

    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v4.4
    • Sprint:
      Sharding 2020-04-20, Sharding 2020-05-04, Sharding 2020-05-18
    • Linked BF Score:
      35

      Description

      Before executing, _configsvrRefineCollectionShardKey takes distributed locks on both a sharded collection's database and full namespace, the latter of which conflicts with the distributed lock a migration takes on a sharded collection namespace. By default, taking a distributed lock times out after 20 seconds and has no fairness policy, so in the presence of many concurrent migrations, a shard key refine can time out waiting for a distributed lock and fail with LockBusy. To handle a similar problem, some config server DDL commands take a NamespaceSerializer lock before taking the distributed lock (e.g. _configsvrDropCollection). We should also be able to use the NamespaceSerializer here to avoid dist lock timeouts.

      If these changes are too invasive, we should instead modify the refineCollectionShardKey concurrency jstests ("random_moveChunk_refine_collection_*.js") to retry refines that fail with a LockBusy error.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              matthew.saltz Matthew Saltz
              Reporter:
              jack.mulrow Jack Mulrow
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: