[SERVER-47330] Extend IDHACK to any unique index where all properties are given as simple "equality" values. Created: 03/Apr/20  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: David Bartley Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 0
Labels: qopt-team
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Query Optimization
Participants:

 Description   

The IDHACK is an optimization for _id lookups. It exploits the fact that _id ~always has a unique index; if the internal find cursor yields a document, there’s no need to iterate the cursor any more, or make an additional “step” in the query planner, which improves performance. This could be generalized to apply to any unique index where all properties of the index are given as simple equality values.

For example, given unique index {foo: 1, bar: 1}, a query {foo: 123, bar: "abc", baz: {$lt: 789}} would be able to use this enhanced IDHACK optimization.

A much more common case for us would be $in queries on _id, which I suspect is common to many users, since it's what you'd do to emulate joins. Today, an $in would require 2x the steps/cursor iterations compared to applying the IDHACK.



 Comments   
Comment by David Bartley [ 08/Apr/20 ]

For us it'd be more important for _id $in queries to be optimized.

Comment by Asya Kamsky [ 08/Apr/20 ]

bartle it's not clear how muxh this would change actual performance.  You can see in SERVER-15802 back in 2.6/3.0 days we already added an optimization that will always favor unique index when there is an equality predicate on that field.

This sounds like you're seeing that but would like to see improvement that would not look at additional keys in this case?

So you're looking for

db.foo.find({a:{$in:[5, 10, 20]}}) 

to examine only 3 keys when there is a unique index on `a` instead of the 6 it would currently examine? We already only examine one key if you query with a:5.

Note that IDHACK is currently only applied to equality. If you query with $in against _id and a list of values, we examine 6 keys rather than 3.

So my two questions are:

  1. is this ticket to extend IDHACK to non-_id equality queries
  2. or is it to apply similar logic to IDHACK to $in queries?

For either of these - did you by any chance try the possible code changes and measure how much time/resources would be saved by a typical query?  We don't have a feel for whether that would be a measurable improvement without actually trying and benchmarking it.

Comment by Carl Champain (Inactive) [ 06/Apr/20 ]

Hi bartle,

Thank you for the report.
We're passing this ticket along to the appropriate team for additional investigation. Updates will be posted on this ticket as they happen.

Kind regards,
Carl

Comment by David Bartley [ 03/Apr/20 ]

Oh, I guess the tile is a bit inaccurate, since it'd apply to simple equality (foo: 123), explicit equality (foo: {$eq: 123}), and $in queries (foo: {$in: [123, 456, 789]}).

Generated at Thu Feb 08 05:13:52 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.