[SERVER-47330] Extend IDHACK to any unique index where all properties are given as simple "equality" values. Created: 03/Apr/20 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | David Bartley | Assignee: | Backlog - Query Optimization |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | qopt-team | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Assigned Teams: |
Query Optimization
|
| Participants: |
| Description |
|
The IDHACK is an optimization for _id lookups. It exploits the fact that _id ~always has a unique index; if the internal find cursor yields a document, there’s no need to iterate the cursor any more, or make an additional “step” in the query planner, which improves performance. This could be generalized to apply to any unique index where all properties of the index are given as simple equality values. For example, given unique index {foo: 1, bar: 1}, a query {foo: 123, bar: "abc", baz: {$lt: 789}} would be able to use this enhanced IDHACK optimization. A much more common case for us would be $in queries on _id, which I suspect is common to many users, since it's what you'd do to emulate joins. Today, an $in would require 2x the steps/cursor iterations compared to applying the IDHACK. |
| Comments |
| Comment by David Bartley [ 08/Apr/20 ] | |
|
For us it'd be more important for _id $in queries to be optimized. | |
| Comment by Asya Kamsky [ 08/Apr/20 ] | |
|
bartle it's not clear how muxh this would change actual performance. You can see in This sounds like you're seeing that but would like to see improvement that would not look at additional keys in this case? So you're looking for
to examine only 3 keys when there is a unique index on `a` instead of the 6 it would currently examine? We already only examine one key if you query with a:5. Note that IDHACK is currently only applied to equality. If you query with $in against _id and a list of values, we examine 6 keys rather than 3. So my two questions are:
For either of these - did you by any chance try the possible code changes and measure how much time/resources would be saved by a typical query? We don't have a feel for whether that would be a measurable improvement without actually trying and benchmarking it. | |
| Comment by Carl Champain (Inactive) [ 06/Apr/20 ] | |
|
Hi bartle, Thank you for the report. Kind regards, | |
| Comment by David Bartley [ 03/Apr/20 ] | |
|
Oh, I guess the tile is a bit inaccurate, since it'd apply to simple equality (foo: 123), explicit equality (foo: {$eq: 123}), and $in queries (foo: {$in: [123, 456, 789]}). |