[SERVER-30145] compound multikey index doesn't use tight bounds for trailing field in spite of $elemMatch Created: 14/Jul/17  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Asya Kamsky Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 4
Labels: query-44-grooming, storch
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Assigned Teams:
Query Optimization
Participants:
Case:

 Description   

Sample data:

db.elem.find()
{ "_id" : ObjectId("5967e1cbb6a1aa8d363befdd"), "a1" : [ { "id" : "a", "a2" : [ { "s" : 23 }, { "s" : 28 }, { "s" : 30 } ] } ] }
{ "_id" : ObjectId("5967e1d2b6a1aa8d363befde"), "a1" : [ { "id" : "b", "a2" : [ { "s" : 23 }, { "s" : 28 }, { "s" : 30 } ] } ] }

And index on "a1.id":1, "a1.a2.s":1 I would expect the query below to use tight bounds on "a1.id" and on "a1.a2.s" but it uses Minkey,Maxkey on "a1.a2.s":

db.elem.find({a1:{$elemMatch:{ id: "a", a2:{$elemMatch:{s:{$gt:25,$lt:29}}}}}}).explain(true)
{
    "queryPlanner" : {
        "plannerVersion" : 1,
        "namespace" : "test.elem",
        "indexFilterSet" : false,
        "parsedQuery" : {
            "a1" : {
                "$elemMatch" : {
                    "$and" : [
                        {
                            "a2" : {
                                "$elemMatch" : {
                                    "$and" : [
                                        {
                                            "s" : {
                                                "$lt" : 29
                                            }
                                        },
                                        {
                                            "s" : {
                                                "$gt" : 25
                                            }
                                        }
                                    ]
                                }
                            }
                        },
                        {
                            "id" : {
                                "$eq" : "a"
                            }
                        }
                    ]
                }
            }
        },
        "winningPlan" : {
            "stage" : "FETCH",
            "filter" : {
                "a1" : {
                    "$elemMatch" : {
                        "$and" : [
                            {
                                "id" : {
                                    "$eq" : "a"
                                }
                            },
                            {
                                "a2" : {
                                    "$elemMatch" : {
                                        "$and" : [
                                            {
                                                "s" : {
                                                    "$lt" : 29
                                                }
                                            },
                                            {
                                                "s" : {
                                                    "$gt" : 25
                                                }
                                            }
                                        ]
                                    }
                                }
                            }
                        ]
                    }
                }
            },
            "inputStage" : {
                "stage" : "IXSCAN",
                "keyPattern" : {
                    "a1.id" : 1,
                    "a1.a2.s" : 1
                },
                "indexName" : "a1.id_1_a1.a2.s_1",
                "isMultiKey" : true,
                "multiKeyPaths" : {
                    "a1.id" : [
                        "a1"
                    ],
                    "a1.a2.s" : [
                        "a1",
                        "a1.a2"
                    ]
                },
                "isUnique" : false,
                "isSparse" : false,
                "isPartial" : false,
                "indexVersion" : 2,
                "direction" : "forward",
                "indexBounds" : {
                    "a1.id" : [
                        "[\"a\", \"a\"]"
                    ],
                    "a1.a2.s" : [
                        "[MinKey, MaxKey]"
                    ]
                }
            }
        },
        "rejectedPlans" : [ ]
    }

If the data is structured with a scalar in a2 (or any other way such that the query on the indexed field is of form {{$elemMatch:{$gt: , $lt }}} then the tight bounds are used.



 Comments   
Comment by Asya Kamsky [ 08/Jun/18 ]

This issue has not been fixed - once it is scheduled and resolved, only then do we continue if it's feasible to backport the fix to older versions.

Comment by Rajasekar Ramamurthy [ 08/Jun/18 ]

We are looking out for a fix in 3.6 release version of this bug

Comment by Tess Avitabile (Inactive) [ 28/Jul/17 ]

We should be able to compound these bounds because although a1.id and a1.a2.s share a multikey path prefix, the predicates are joined by an $elemMatch on that path prefix.

The problem is indeed that we only keep track of one elemMatchExpr per predicate, that of the innermost $elemMatch containing the predicate. This means that we only allow two predicates that share a multikey path prefix to be compounded if their innermost $elemMatch is the same and is inside the multikey path prefix. Instead, we should allow two predicates that share a multikey path prefix to be compounded if their outermost $elemMatch that is inside the multikey path prefix is the same. Equivalently, if they have any common $elemMatch inside the multikey path prefix.

We could fix this by storing every $elemMatch expression containing a predicate. Then for the first predicate we index, we could compute the $elemMatch expressions containing it that are inside of the multikey path prefix here (or just the outermost one). When we attempt to compound with a second predicate, we could compare the stored $elemMatch(s) against the set of $elemMatch expressions containing the second predicate here.

Comment by Ian Whalen (Inactive) [ 28/Jul/17 ]

We think this has to do with only keeping track of the innermost elemmatch context for each predicate. Tess to dig further with Asya to figure out what's causing this.

Generated at Thu Feb 08 04:22:49 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.