[SERVER-21697] Plan ranking should take query and index keys into consideration for breaking ties Created: 30/Nov/15 Updated: 11/Sep/23 Resolved: 11/Sep/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Querying |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Asya Kamsky | Assignee: | Backlog - Query Optimization |
| Resolution: | Duplicate | Votes: | 8 |
| Labels: | bonsai, storch | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Assigned Teams: |
Query Optimization
|
||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||
| Description |
|
In situation where there are two plans that tie, if one matches more (leading?) keys in the query than another then it's more likely to be a better plan for other queries of this shape. Example:
For value of a in query that matches one or no documents all three indexes are going to perform equally (yes?) - we should break the tie in favor of the first one because it matches the most predicate fields in the query. |
| Comments |
| Comment by Alexander Ignatyev [ 11/Sep/23 ] |
|
The fix was implemented in PM-3316 via |
| Comment by David Bartley [ 05/Jun/19 ] |
|
As a concrete example, you can imagine that it's pretty common to shard collections based on a "user" or "account" key, and have ~all queries/indexes start with this "user" key. For an email messages collection with indexes {user: 1, sent_time: -1} and {user: 1, recipient: 1}, a query on user+recipient might perform equally well on both indexes (i.e. tie) for a particular user (imagine a user that only sends emails to a single recipient) or recipient (perhaps the given recipient was simply the most recently emailed), but not others. In practice, we've found that such sub-optimal plans will often make a query that might normally take 1-2 ms take 100's of ms or even 10's of seconds to execute; in the worst case, a sub-optimal plan might require scanning all documents for a given "user". As a result, we generally require all queries to be hinted. This is doable for queries that are statically defined, but is much harder for queries that are dynamically constructed. |
| Comment by David Storch [ 05/Jun/19 ] |
|
In |
| Comment by Asya Kamsky [ 28/Dec/17 ] |
|
Somewhat related to SERVER-20616 since taking full predicate vs indexed fields into account in cases of ties can tip the odds to an index more likely to be more selective (in other scenarios than the one being raced). |