-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Sharding
-
Fully Compatible
-
ALL
-
v4.4
-
Sharding 2020-04-20, Sharding 2020-05-04, Sharding 2020-05-18
-
35
Before executing, _configsvrRefineCollectionShardKey takes distributed locks on both a sharded collection's database and full namespace, the latter of which conflicts with the distributed lock a migration takes on a sharded collection namespace. By default, taking a distributed lock times out after 20 seconds and has no fairness policy, so in the presence of many concurrent migrations, a shard key refine can time out waiting for a distributed lock and fail with LockBusy. To handle a similar problem, some config server DDL commands take a NamespaceSerializer lock before taking the distributed lock (e.g. _configsvrDropCollection). We should also be able to use the NamespaceSerializer here to avoid dist lock timeouts.
If these changes are too invasive, we should instead modify the refineCollectionShardKey concurrency jstests ("random_moveChunk_refine_collection_*.js") to retry refines that fail with a LockBusy error.
- is related to
-
SERVER-36322 NamespaceSerializer lock should be used for dropCollection
- Closed
- related to
-
SERVER-49145 Prevent distributed lock timeouts in suites with background migrations
- Closed