[SERVER-25053] removeShard checks are inherently racy Created: 13/Jul/16 Updated: 09/Sep/20 Resolved: 16/Mar/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 3.3.9 |
| Fix Version/s: | 4.4.0-rc0, 4.7.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Esha Maharishi (Inactive) | Assignee: | Alexander Taskov (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | PM-108, sharding-4.4-stabilization, sharding-wfbf-day | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Backport Requested: |
v4.4
|
||||||||||||||||
| Sprint: | Sharding 2020-03-23 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Linked BF Score: | 26 | ||||||||||||||||
| Description |
|
removeShard does a series of checks before marking a shard as "draining" (aka to be removed) on the config server, including:
However, these checks are not guarded by a distributed lock (or even an in-process lock for a single mongos), and so two removeShard requests to either two different mongoses or the same mongos can pass all checks concurrently and remove two shards at once. This can be fixed by the new locking mechanism being added for the zone sharding project. |
| Comments |
| Comment by Githook User [ 26/Mar/20 ] |
|
Author: {'name': 'Alex Taskov', 'username': 'alextaskov', 'email': 'alex.taskov@mongodb.com'}Message: (cherry picked from commit 2c19c31f910e5b336b7f3b206a3d57d202100ae6) |
| Comment by Githook User [ 16/Mar/20 ] |
|
Author: {'username': 'alextaskov', 'name': 'Alex Taskov', 'email': 'alex.taskov@mongodb.com'}Message: |
| Comment by Esha Maharishi (Inactive) [ 19/Dec/19 ] |
|
removeShard does take the new _kShardMembershipLock, but it takes it after checking if this is the last draining shard. So, two concurrent removeShards could still both check that they are not the last draining shard, then both mark their shards as draining. The lock should probably be taken before doing any checks. |