[SERVER-38344] Early release of distributed database locks during initial collection sharding results in migration/split failures Created: 30/Nov/18  Updated: 29/Oct/23  Resolved: 12/Feb/19

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.4.17, 3.6.9
Fix Version/s: 3.6.11

Type: Bug Priority: Major - P3
Reporter: Eric Sedor Assignee: Blake Oler
Resolution: Fixed Votes: 0
Labels: ShardingRoughEdges, neweng, sharding-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v3.4
Sprint: Sharding 2019-01-14, Sharding 2019-02-25
Participants:

 Description   

This is for 3.6, with backport requested to 3.4.

When sharding multiple collections simultaneously, "[t]he shardCollection distributed locks are only held until the initial "large" chunks are created and they are dropped just before we run the migrateAndFurtherSplitInitialChunks stage, which is what spreads them evenly across the cluster. Because the dist lock is dropped, the second collection's sharding begins and then we start having migrations that run in parallel on the same shard (the primary shard) and this is not allowed, so one of the initial splits would fail." This can result in initial distributions of chunks that are not evenly distributed.

The initially proposed solution is to "[release] the collection dist lock before the moves, but leave the two database locks held until after the migrates and splits have completed. This will work because split and migrate only takes collection dist lock."

This is not an issue in 4.0.4.



 Comments   
Comment by Githook User [ 12/Feb/19 ]

Author:

{'name': 'Blake Oler', 'email': 'blake.oler@mongodb.com', 'username': 'BlakeIsBlake'}

Message: SERVER-38344 Hold database distlocks during migration and split
Branch: v3.6
https://github.com/mongodb/mongo/commit/9d06566d7f8c9affbadd2bb9c54bb94457863dd0

Generated at Thu Feb 08 04:48:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.