[SERVER-73792] Partial index isSubsetOf could be more precise for $or / $in Created: 08/Feb/23  Updated: 24/Feb/23

Status: Open
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: David Percy Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 0
Labels: query-product-scope-1, query-product-urgency-3, query-product-value-2
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Query Optimization
Participants:

 Description   

When a partial filter expression uses $or, we sometimes don't recognize that it can satisfy a query using $or. For example:

> db.c.createIndex({a: 1}, {partialFilterExpression: {$or: [ {a: {$lt: 0}}, {a: {$gt: 10}} ]}})
{
        "numIndexesBefore" : 1,
        "numIndexesAfter" : 2,
        "createdCollectionAutomatically" : true,
        "ok" : 1
}
> for (a=-2; a<15; ++a) db.c.insert({a})
WriteResult({ "nInserted" : 1 })
> db.c.find({$or: [ {a: -1}, {a: 12} ]}).explain()
{
    ...
                "winningPlan" : {
                        "stage" : "COLLSCAN",
    ...
}

This query should use the index, because any document matching {a: -1} will be indexed, and any document matching {a: 12} will be indexed.



 Comments   
Comment by Ana Meza [ 14/Feb/23 ]

christopher.harris@mongodb.com would you mind looking at this for priority, since Anton thinks this might be one for the backlog. 

Comment by David Percy [ 08/Feb/23 ]

It looks like we're getting the wrong answer (too conservative) because of how we break up isSubsetOf() with $and/$or into subproblems.

There are 4 rules for $and / $or:

  • {$and: [ X, Y ]} subset A if (X subset A) or (Y subset A)
    • $and makes the query smaller; easier to fit into the index
  • {$or: [ X, Y ]} subset A if (X subset A) and (Y subset A)
    • $or makes the query bigger; harder to fit into the index
  • X subset {$or: [ A, B ]} if (X subset A) or (X subset B)
    • $or makes the index bigger; easier for the query to fit
  • X subset {$and: [ A, B ]} if (X subset A) and (X subset B)
    • $and makes the index smaller; harder for the query to fit

Some rules make the subproblems harder, but only one subproblem has to be true. Some rules make the subproblems easier, but all of them have to be true.

The reason the example goes wrong, is that we're making the subproblems too hard:

  • Is {$or: [ {a: -1}, {a: 12} ]} a subset of {$or: [ {a: {$lt: 0}}, {a: {$gt: 10}} ]} ?
    • Is {$or: [ {a: -1}, {a: 12} ]} a subset of {a: {$lt: 0}} ?
      • No.
    • Is {$or: [ {a: -1}, {a: 12} ]} a subset of {a: {$gt: 10}} ?
      • No.

So I think we want to always prefer the rule that makes the subproblems easier.

Note that when we compare $and vs $and, we are getting the right answer, because we're preferring to break up the right-hand side first, which makes the subproblems easier.

Generated at Thu Feb 08 06:25:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.