[SERVER-4960] Optimize shard selection on $in queries Created: 13/Feb/12  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Querying, Sharding
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Spencer Brody (Inactive) Assignee: Backlog - Query Execution
Resolution: Unresolved Votes: 5
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
is duplicated by SERVER-9205 MongoS does not re-write $in queries ... Closed
Related
related to SERVER-4745 Figuring out which shard to send a qu... Closed
related to SERVER-4555 check and clean sharding utilization ... Closed
related to SERVER-9332 Mongos should re-write $in queries on... Closed
is related to SERVER-50299 Shard Targeting for $in statements Closed
Assigned Teams:
Query Execution
Participants:
Case:

 Description   

Selecting the appropriate shard to send a query to takes a long time on $in queries. SERVER-4745 mitigates the affect of this, but this code could use rewriting and further optimization.



 Comments   
Comment by Spencer Brody (Inactive) [ 11/Apr/13 ]

I'm moving the part of this ticket about re-writing the query sent to the shards to SERVER-9332, and leaving this one to focus on the performance optimizations that can be done here.

Comment by Antoine Girbal [ 01/Apr/13 ]

In a use case, the $in has 50 items so most likely the targeted shards will almost always be all shards.
A good optimization would be if mongos would rewrite the $in list to only include values that may be on a shard.
Mongos needs to lookup that value in the chunk list anyway in order to decide which shards to use, so it could create $in lists at the same time.

This may create race conditions issues in the context of migration though.
Instead, could sent the full $in to all shards, but then mongod checks the values against the chunk ranges 1st (much smaller than btree).

Comment by Ben [ 16/May/12 ]

Is it possible to hint a query as an all-shards query? For large $in queries it seems like most of the effort in getShardsForQuery is wasted anyways

Comment by Spencer Brody (Inactive) [ 30/Apr/12 ]

@Ben, my timing didn't get that fine-grained. I just verified that a large $in query (with no actual data) took longer on mongos than the sum of the mongods by enough of a margin to seem more than the expected overhead - most likely due to the shard selection code.

Comment by Ben [ 28/Apr/12 ]

Spencer, did you say were you able to reproduce a slow FieldRange constructor with a large $in query? From my timing blocks, it seemed like inserting into:

set<BSONElement,element_lt> vals;

was going slowly (Not sure why).

Comment by Aaron Staple [ 13/Feb/12 ]

Thanks Spencer,

Just in case this ticket gets picked up in the future here's some info from one more email:

It's worth keeping an eye on how performance of the overall operation changes in the new implementation when there is high C and low R. There are various optimizations that can be done (with additional work), would also be possible to use a hybrid of the current and proposed fancy implementations, or to chose just one of them selectively based on values of C and R.

Also...

One other note, that stuff about # chunks above is really # chunk ranges. I've been assuming that in the worst case we might have really fragmented chunks on all shards but one, and as a result # chunk ranges ~ # chunks but there is still a good chance of a query not hitting all shards. But I don't know if this is actually right.

Comment by Spencer Brody (Inactive) [ 13/Feb/12 ]

Some notes about how to do this taken from an email with Aaron:

The btree scanning code uses FieldRangeVectorIterator to traverse
btrees. However the interface of FieldRangeVectorIterator doesn't
make it explicit whether an object matches a FieldRangeVector (ordered
list of valid field ranges, in your case < a ∈

{1,3,...}, b ∈{1,3,...}

>). Also the FieldRangeVectorIterator interface is
optimized to prevent building BSONObjs - it is a special purpose class
right now and is probably not particularly easy to use for other
things (and I think the original version is about 1.5 years old). I'd
recommend that that we add another class that can be used for
iterating through a list of ranges (in this case shard ranges) that
delegates most of its functionality to FieldRangeVectorIterator (and
modify FieldRangeVectorIterator a bit if necessary).

[...]

I think a better option might be to ignore FieldRangeVectorIterator and write a function similar to FieldRangeVector::matches(), something like FieldRangeVector::matchesBetween( start, end ). High level you will basically want to check FieldRangeVector::matchingLowElement for all the field ranges of the start and end keys and then see if any matches are possible between these low element indexes. And also be sure to keep in mind inclusivity of the interval bounds.

Bascially the way this works if you have a query like {a:{$in:[1,2]},b:{$in:[3,4]}} and index

{a:1,b:1}

your FieldRangeVector will keep a structure like

{ a: [ [1,1], [2,2] ], b: [ [3,3], [4,4] ] }

. And matchingLowElement for a will do a binary search on bounds of field a, so a binary search on [ 1, 1, 2, 2 ] will return the index of the leftmost value below a match I think. So if your query element is 0 you will get -1, if 1 you will get 0, if 1.5 you will get 1, if 2 you will get 2, etc. The parity of the index will tell you if the element matches (see FieldRangeVector::matchesElement()). (This is my recollection of how it works at least.)

[...]

So if C = # chunks and R = combinatorial # of valid key ranges that need to be scanned, the current implementation is O(R*log(C)+C). I'm thinking that we might flip things around and do an implementation that is O(C*log(R)) since I'm guessing that in the problematic cases R >> C. (Even if R is not that much bigger, the implementation will save us from precomputing the different ranges which is the bulk of the time in the case you are looking at now. But in cases with high R it will help a lot. Btw, I don't really know what real world values of C might be so feel free to correct me if this sounds wrong. I can definitely see R growing faster than C more easily - we have a hard cap on R right now looks like 1 or 4 million (someone seems to have added inconsistent caps in different places, we should fix that), but we could probably allow a lot higher.)

One way we could do this would be to iterate through the chunks, and for each chunk we do a binary search through the combinatorial space of valid field ranges looking for the start and end of the chunk. And we see if there are any valid field ranges between the start and end. We can do this binary search directly with a FieldRangeVector (after adding on some functionality), often without exploring large portions of the set product space.

So simple example if I'm matching values in

{ 1, 3, 5, 7 } and my chunk is [2,2.5) I do a binary search for 2 and a binary search for 2.5 and see they're in the same spot and match nothing. If my chunk is [4,6) I do the same thing and can see that my binary search sends me to different positions meaning that there is a match between them.

Similarly for a compound key if I'm matching values in { 1, 3, 5, 7 }

x

{ 1, 3, 5, 7 }

==

{ <1,1>, <1,3>, ... }

and my chunk is [<1,8>,<2,2>) the start and end key are in the same region of the set product range space and not within a valid range, but if my chunk is [<1,8>,<3,2>) the start and end bounds aren't in the same region so there must be a region between them with matches. And we can determine this without generating or exploring the whole set product.

Another important optimization that would make sense to do here would be that we should only do this binary search and check the ranges for a chunk if that chunk's shard is not already in the set of added shards.

There is an alternative implementation where we would iterate through the combinatorial space of matching ranges in parallel with the iteration through chunks. And we would basically be improving on binary search (in an amortized sense) to find where in the space of field bounds the start and end keys of the chunks lie because we are iterating through the chunks in order and can exclude regions of the binary search space below the previous searched value. This is somewhat related to what the FieldRangeVectorIterator does. But I think it would be more work to implement.

Generated at Thu Feb 08 03:07:28 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.