[SERVER-4960] Optimize shard selection on $in queries Created: 13/Feb/12 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Querying, Sharding |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Spencer Brody (Inactive) | Assignee: | Backlog - Query Execution |
| Resolution: | Unresolved | Votes: | 5 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Assigned Teams: |
Query Execution
|
||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||||||
| Description |
|
Selecting the appropriate shard to send a query to takes a long time on $in queries. |
| Comments |
| Comment by Spencer Brody (Inactive) [ 11/Apr/13 ] |
|
I'm moving the part of this ticket about re-writing the query sent to the shards to |
| Comment by Antoine Girbal [ 01/Apr/13 ] |
|
In a use case, the $in has 50 items so most likely the targeted shards will almost always be all shards. This may create race conditions issues in the context of migration though. |
| Comment by Ben [ 16/May/12 ] |
|
Is it possible to hint a query as an all-shards query? For large $in queries it seems like most of the effort in getShardsForQuery is wasted anyways |
| Comment by Spencer Brody (Inactive) [ 30/Apr/12 ] |
|
@Ben, my timing didn't get that fine-grained. I just verified that a large $in query (with no actual data) took longer on mongos than the sum of the mongods by enough of a margin to seem more than the expected overhead - most likely due to the shard selection code. |
| Comment by Ben [ 28/Apr/12 ] |
|
Spencer, did you say were you able to reproduce a slow FieldRange constructor with a large $in query? From my timing blocks, it seemed like inserting into: set<BSONElement,element_lt> vals; was going slowly (Not sure why). |
| Comment by Aaron Staple [ 13/Feb/12 ] |
|
Thanks Spencer, Just in case this ticket gets picked up in the future here's some info from one more email: It's worth keeping an eye on how performance of the overall operation changes in the new implementation when there is high C and low R. There are various optimizations that can be done (with additional work), would also be possible to use a hybrid of the current and proposed fancy implementations, or to chose just one of them selectively based on values of C and R. Also... One other note, that stuff about # chunks above is really # chunk ranges. I've been assuming that in the worst case we might have really fragmented chunks on all shards but one, and as a result # chunk ranges ~ # chunks but there is still a good chance of a query not hitting all shards. But I don't know if this is actually right. |
| Comment by Spencer Brody (Inactive) [ 13/Feb/12 ] |
|
Some notes about how to do this taken from an email with Aaron: The btree scanning code uses FieldRangeVectorIterator to traverse >). Also the FieldRangeVectorIterator interface is [...] I think a better option might be to ignore FieldRangeVectorIterator and write a function similar to FieldRangeVector::matches(), something like FieldRangeVector::matchesBetween( start, end ). High level you will basically want to check FieldRangeVector::matchingLowElement for all the field ranges of the start and end keys and then see if any matches are possible between these low element indexes. And also be sure to keep in mind inclusivity of the interval bounds. Bascially the way this works if you have a query like {a:{$in:[1,2]},b:{$in:[3,4]}} and index {a:1,b:1}your FieldRangeVector will keep a structure like { a: [ [1,1], [2,2] ], b: [ [3,3], [4,4] ] }. And matchingLowElement for a will do a binary search on bounds of field a, so a binary search on [ 1, 1, 2, 2 ] will return the index of the leftmost value below a match I think. So if your query element is 0 you will get -1, if 1 you will get 0, if 1.5 you will get 1, if 2 you will get 2, etc. The parity of the index will tell you if the element matches (see FieldRangeVector::matchesElement()). (This is my recollection of how it works at least.) [...] So if C = # chunks and R = combinatorial # of valid key ranges that need to be scanned, the current implementation is O(R*log(C)+C). I'm thinking that we might flip things around and do an implementation that is O(C*log(R)) since I'm guessing that in the problematic cases R >> C. (Even if R is not that much bigger, the implementation will save us from precomputing the different ranges which is the bulk of the time in the case you are looking at now. But in cases with high R it will help a lot. Btw, I don't really know what real world values of C might be so feel free to correct me if this sounds wrong. I can definitely see R growing faster than C more easily - we have a hard cap on R right now looks like 1 or 4 million (someone seems to have added inconsistent caps in different places, we should fix that), but we could probably allow a lot higher.) One way we could do this would be to iterate through the chunks, and for each chunk we do a binary search through the combinatorial space of valid field ranges looking for the start and end of the chunk. And we see if there are any valid field ranges between the start and end. We can do this binary search directly with a FieldRangeVector (after adding on some functionality), often without exploring large portions of the set product space. So simple example if I'm matching values in { 1, 3, 5, 7 } and my chunk is [2,2.5) I do a binary search for 2 and a binary search for 2.5 and see they're in the same spot and match nothing. If my chunk is [4,6) I do the same thing and can see that my binary search sends me to different positions meaning that there is a match between them.Similarly for a compound key if I'm matching values in { 1, 3, 5, 7 } x { 1, 3, 5, 7 }== { <1,1>, <1,3>, ... }and my chunk is [<1,8>,<2,2>) the start and end key are in the same region of the set product range space and not within a valid range, but if my chunk is [<1,8>,<3,2>) the start and end bounds aren't in the same region so there must be a region between them with matches. And we can determine this without generating or exploring the whole set product. Another important optimization that would make sense to do here would be that we should only do this binary search and check the ranges for a chunk if that chunk's shard is not already in the set of added shards. There is an alternative implementation where we would iterate through the combinatorial space of matching ranges in parallel with the iteration through chunks. And we would basically be improving on binary search (in an amortized sense) to find where in the space of field bounds the start and end keys of the chunks lie because we are iterating through the chunks in order and can exclude regions of the binary search space below the previous searched value. This is somewhat related to what the FieldRangeVectorIterator does. But I think it would be more work to implement. |