[SERVER-21697] Plan ranking should take query and index keys into consideration for breaking ties Created: 30/Nov/15  Updated: 11/Sep/23  Resolved: 11/Sep/23

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Asya Kamsky Assignee: Backlog - Query Optimization
Resolution: Duplicate Votes: 8
Labels: bonsai, storch
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
is duplicated by SERVER-59337 Bad index selection when using $in Closed
is duplicated by SERVER-80233 Implement index prefix heuristic Closed
is duplicated by SERVER-40844 Better tie breaking of "perfect" indexes Closed
Related
related to SERVER-13211 Optimal index not chosen for query pl... Closed
Assigned Teams:
Query Optimization
Participants:
Case:

 Description   

In situation where there are two plans that tie, if one matches more (leading?) keys in the query than another then it's more likely to be a better plan for other queries of this shape.

Example:

db.collection.find({a:999, b:111, c:"x"}).sort({d:-1})
// indexes for collection:
{a:1,b:1,c:1,d:1}
{a:1,b:1,d:1}
{a:1,d:1}

For value of a in query that matches one or no documents all three indexes are going to perform equally (yes?) - we should break the tie in favor of the first one because it matches the most predicate fields in the query.



 Comments   
Comment by Alexander Ignatyev [ 11/Sep/23 ]

The fix was implemented in PM-3316 via SERVER-80233.

Comment by David Bartley [ 05/Jun/19 ]

As a concrete example, you can imagine that it's pretty common to shard collections based on a "user" or "account" key, and have ~all queries/indexes start with this "user" key. For an email messages collection with indexes {user: 1, sent_time: -1} and {user: 1, recipient: 1}, a query on user+recipient might perform equally well on both indexes (i.e. tie) for a particular user (imagine a user that only sends emails to a single recipient) or recipient (perhaps the given recipient was simply the most recently emailed), but not others.

In practice, we've found that such sub-optimal plans will often make a query that might normally take 1-2 ms take 100's of ms or even 10's of seconds to execute; in the worst case, a sub-optimal plan might require scanning all documents for a given "user". As a result, we generally require all queries to be hinted. This is doable for queries that are statically defined, but is much harder for queries that are dynamically constructed.

Comment by David Storch [ 05/Jun/19 ]

In SERVER-40844, bartle@stripe.com suggests implementing a simple version of this behavior where there is a scoring bonus associated with a "perfect" match. That is, in the case of a tie between two single-index plans, prefer the plan where the set of paths over which there are equality predicates is identical to the set of paths in the index. For instance, if there are equality predicates on "a", "b", and "c", prefer index {c: 1, a: 1, b: 1} over either {a: 1, b: 1} or {a: 1, b: 1, z: 1}. Implementing this safely could be challenging, since the order of the index key pattern is important and may affect the performance of the plan.

Comment by Asya Kamsky [ 28/Dec/17 ]

Somewhat related to SERVER-20616 since taking full predicate vs indexed fields into account in cases of ties can tip the odds to an index more likely to be more selective (in other scenarios than the one being raced).

Generated at Thu Feb 08 03:58:09 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.