[SERVER-49274] Coalesce scans of the same index in plans for rooted ORs Created: 01/Jul/20  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Ian Boros Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 0
Labels: qopt-team
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-31360 MatchExpression::getOptimizer() for $... Closed
Assigned Teams:
Query Optimization
Participants:

 Description   

A rooted OR query with n branches gets turned into a plan with n index scans, even if each scan is over the same index. For example, consider a collection with an index on a:1,b:1. This query will result in a plan with two index scans:

db.c.find({$or: [{a:1,b:1}, {a: 1, b: 2}]}) 

"stage" : "SUBPLAN",
"inputStage" : {
    "stage" : "FETCH",
    "inputStage" : {
        "stage" : "OR",
        "inputStages" : [
            {
                "stage" : "IXSCAN",
                "keyPattern" : {
                    "a" : 1,
                    "b" : 1
                },
                "indexName" : "a_1_b_1",
                "isMultiKey" : false,
                "multiKeyPaths" : {
                    "a" : [ ],
                    "b" : [ ]
                },
                "isUnique" : false,
                "isSparse" : false,
                "isPartial" : false,
                "indexVersion" : 2,
                "direction" : "forward",
                "indexBounds" : {
                    "a" : [
                        "[1.0, 1.0]"
                    ],
                    "b" : [
                        "[1.0, 1.0]"
                    ]
                }
            },
            {
                "stage" : "IXSCAN",
                "keyPattern" : {
                    "a" : 1,
                    "b" : 1
                },
                "indexName" : "a_1_b_1",
                "isMultiKey" : false,
                "multiKeyPaths" : {
                    "a" : [ ],
                    "b" : [ ]
                },
                "isUnique" : false,
                "isSparse" : false,
                "isPartial" : false,
                "indexVersion" : 2,
                "direction" : "forward",
                "indexBounds" : {
                    "a" : [
                        "[1.0, 1.0]"
                    ],
                    "b" : [
                        "[2.0, 2.0]"
                    ]
                }
            }
        ]
    }
}

The same logic applies even if there are more than two branches. We know of one instance where a customer was running queries of this form with 20 branches (and was getting 20 IX scans).

It seems like it would be a strict improvement to use a plan which does a single index scan, instead of one per branch. Today, it's possible to "force" this behavior in some cases by using $in. For example:

db.c.find({a: 1, b: {$in: [1,2]}}) 

This query would result in a plan which does a scan over the a:1,b:1 index and uses bounds a:[1,1], b:[1,2].

 

Using $in is not always possible, however. For example, the below query could not be re-written to use $in because doing so would change its meaning:

db.c.find({$or: [{a:1,b:1}, {a: 2, b: 2}]}) 

Using $in:

db.c.find({a: {$in: [1,2]}, b: {$in: [1,2]}}) 

The second query would return documents where a is 1 but b is 2, whereas the first query would not.



 Comments   
Comment by Asya Kamsky [ 07/Jul/20 ]

Linking to SERVER-31360 which is somewhat related...

Comment by Ian Boros [ 01/Jul/20 ]

I'm pretty sure that in order to do this in the general case, we'd need a more expressive way of representing index bounds.

Generated at Thu Feb 08 05:19:24 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.