[SERVER-46487] The mongos routing for scatter/gather ops can have unbounded latency Created: 28/Feb/20 Updated: 29/Oct/23 Resolved: 16/Apr/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 3.6.0, 4.0.0, 4.2.0 |
| Fix Version/s: | 4.0.19, 4.2.7, 3.6.19, 4.4.0-rc3, 4.7.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Josef Ahmad | Assignee: | Gregory Noma |
| Resolution: | Fixed | Votes: | 4 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Backport Requested: |
v4.4, v4.2, v4.0, v3.6
|
||||||||||||||||
| Sprint: | Sharding 2020-04-20 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||
| Description |
|
Issue Status as of April 2020 ISSUE SUMMARY The mongoS routing of scatter/gather operations iterates over the boundaries for the collection’s chunks sorted in ascending order. As the list is iterated, shards that own a chunk for the collection are discovered. If all shards in the cluster that own a chunk for the collection are discovered partway through the iteration, the iteration early-exits and the mongoS targets all of these shards. This logic is suboptimal if a large number of the sorted chunks are initially owned by a subset of shards. This can result in a large number of iterations needed to discover all shards. As an example, on a 2-shard cluster with a collection where the first 100k sorted chunks are owned on shard 1, the mongoS iterates over the boundaries for 100k chunks when routing a scatter/gather operation. USER IMPACT On a sharded collection where a large number of the sorted chunks (e.g. 100k) are initially owned by a subset of shards, a scatter/gather operation can display increased latency (e.g. milliseconds) or increased mongoS CPU utilization. WORKAROUND Rearrange the chunks in a collection so that the initial chunks in the sorted list discover all the shards. The script below can be used to verify the number of iterations required to route a scatter/gather operation.
|
| Comments |
| Comment by Githook User [ 24/Jun/20 ] |
|
Author: {'name': 'Gregory Noma', 'email': 'gregory.noma@gmail.com', 'username': 'gregorynoma'}Message: (cherry picked from commit d9d92a15561dc84223d302475e09d0fa91309550) |
| Comment by Githook User [ 23/Apr/20 ] |
|
Author: {'name': 'Gregory Noma', 'email': 'gregory.noma@gmail.com', 'username': 'gregorynoma'}Message: (cherry picked from commit d9d92a15561dc84223d302475e09d0fa91309550) |
| Comment by Githook User [ 23/Apr/20 ] |
|
Author: {'name': 'Gregory Noma', 'email': 'gregory.noma@gmail.com', 'username': 'gregorynoma'}Message: (cherry picked from commit d9d92a15561dc84223d302475e09d0fa91309550) |
| Comment by Githook User [ 23/Apr/20 ] |
|
Author: {'name': 'Gregory Noma', 'email': 'gregory.noma@gmail.com', 'username': 'gregorynoma'}Message: (cherry picked from commit d9d92a15561dc84223d302475e09d0fa91309550) |
| Comment by Githook User [ 16/Apr/20 ] |
|
Author: {'name': 'Gregory Noma', 'email': 'gregory.noma@gmail.com', 'username': 'gregorynoma'}Message: |
| Comment by Cen Zheng [ 14/Apr/20 ] |
|
Hi, Max,
This is exactly what I mean! I will upload my patch here for your reference. In fact, I didn't realize there is an existing checkAllElementsAreOfType(MinKey, ...) function so I made one in the BSONObj class again. Now that I can use this one instead! Thank you and please forward my regards to Esha! |
| Comment by Max Hirschhorn [ 14/Apr/20 ] |
|
Assigning this one to you gregory.noma.
The early-exiting is referring to this part of ChunkManager::getShardIdsForRange(). As noted in Cen Zhang's comment above, the !_clusterTime part of that condition makes it so this optimization only applies when using the node's view of the latest routing information and not when the atClusterTime read concern option has been specified. Queries that aren't targeted by the shard key will be broadcasted to all shards which own chunks for the collection. The list of shards which own chunks for the collection corresponds to the keys of the RoutingTableHistory::_shardVersions map. We're looking to introduce another optimization (again when the atClusterTime read concern option hasn't been specified) such that if _rt->overlappingRanges(min, max, true) were going to cause all chunks to be iterated over, then we could instead populate the shardIds set with the keys of the RoutingTableHistory::_shardVersions map and skip doing the iteration itself. You could think about it as being RoutingTableHistory::overlappingRanges(min, max, true) returned {RoutingTableHistory::_chunkMap.begin(), RoutingTableHistory::_chunkMap.end()}. I'll leave whether we want to express the optimization that way to you to figure out. Esha had also pointed me to the checkAllElementsAreOfType(MinKey, ...) and checkAllElementsAreOfType(MaxKey, ...) that we do here as a technique we could use to avoid calling RoutingTableHistory::overlappingRanges() by comparing the min and max BSONObjs directly. We'll likely want to backport these changes so their simplicity (e.g. ease of reasoning about their correctness) is also an important factor. As part of this work, we should add a microbenchmark to chunk_manager_refresh_bm.cpp in order to demonstrate the performance improvement being made. I don't believe we have an existing case where ChunkManager::getShardIdsForRange() is called for the range [MinKey, MaxKey]. The optimalShardSelector() chunk distribution should see the most benefit because each shard owning a contiguous series of ranges in-order is when it'd currently take the longest to see all shards by iterating over the ranges. |
| Comment by Cen Zheng [ 13/Apr/20 ] |
|
Hi, As I mentioned in "To resolve this issue, I think we can have a fast path for getShardIdsForRange() when we are not reading from a snapshot and the query range(all shard key fields) is [MinKey, MaxKey], we only need to return all ShardIds through getAllShardIds()." What do you think of this optimization? Is it possible? Looking forward for your feedback. Thanks! |
| Comment by KV Prasad [ 02/Mar/20 ] |
|
Looking for a sooner resolution of this and hope to see it implemented in the upcoming patch. |