[SERVER-38344] Early release of distributed database locks during initial collection sharding results in migration/split failures Created: 30/Nov/18 Updated: 29/Oct/23 Resolved: 12/Feb/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 3.4.17, 3.6.9 |
| Fix Version/s: | 3.6.11 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Eric Sedor | Assignee: | Blake Oler |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | ShardingRoughEdges, neweng, sharding-wfbf-day | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Backport Requested: |
v3.4
|
||||||||
| Sprint: | Sharding 2019-01-14, Sharding 2019-02-25 | ||||||||
| Participants: | |||||||||
| Description |
|
This is for 3.6, with backport requested to 3.4. When sharding multiple collections simultaneously, "[t]he shardCollection distributed locks are only held until the initial "large" chunks are created and they are dropped just before we run the migrateAndFurtherSplitInitialChunks stage, which is what spreads them evenly across the cluster. Because the dist lock is dropped, the second collection's sharding begins and then we start having migrations that run in parallel on the same shard (the primary shard) and this is not allowed, so one of the initial splits would fail." This can result in initial distributions of chunks that are not evenly distributed. The initially proposed solution is to "[release] the collection dist lock before the moves, but leave the two database locks held until after the migrates and splits have completed. This will work because split and migrate only takes collection dist lock." This is not an issue in 4.0.4. |
| Comments |
| Comment by Githook User [ 12/Feb/19 ] |
|
Author: {'name': 'Blake Oler', 'email': 'blake.oler@mongodb.com', 'username': 'BlakeIsBlake'}Message: |