[SERVER-69119] For queries using SBE, auto-parameterize predicates written using $expr Created: 24/Aug/22  Updated: 14/Mar/23

Status: Open
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: David Storch Assignee: Backlog - Query Execution
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Query Execution
Participants:
Story Points: 10

 Description   

Consider the following two similar-looking queries:

db.coll.find({a: {$eq: "constant"}})
 
db.coll.find({$expr: {$eq: ["$a", "constant"]}})

The queries are quite similar in meaning (though not identical) and both can use an index on {a: 1}. Therefore, you might expect that in both cases the "constant" gets auto-parameterized. However, this is not the case in the current implementation of auto-parameterization for the SBE plan cache. The constant in the first query will get auto-parameterized, but the constant beneath the $expr will not. In fact, we never auto-parameterize anything inside a $expr at the moment.

One place where this can come up is if the $lookup join predicate is expressed using $expr. As an example, consider this $lookup used in TPC-H query 18:

db.getSiblingDB('tpch').orders.aggregate([
    {
        "$lookup": {
            "from": "lineitem",
            "let": {"o_orderkey": "$o_orderkey"},
            "as": "lineitem",
            "pipeline": [
                {"$match": {"$expr": {"$eq": ["$$o_orderkey", "$l_orderkey"]}}},
                {"$group": {"_id": "$l_orderkey", "sum(l_quantity)": {"$sum": "$l_quantity"}}},
                {"$match": {"$expr": {"$gt": ["$sum(l_quantity)", 300]}}},
                {"$project": {"_id": 0, "o_orderkey": "$_id", "sum(l_quantity)": 1}}
            ]
        }
    },
    ...
]);

For every document from the scan of the orders collection, a query is internally composed against the lineitems collection. This query will include the equality predicate "o_orderkey == l_orderkey", expressed using $expr. Each such query will have a different constant substituted for "o_orderkey", and therefore without auto-parameterization of $expr will result in a different plan cache key. Note that this behavior will go away once we implement SERVER-69103, which will prevent SBE from being used on the inner side of a DocumentSourceLookup.

In order to constrain the scope of this improvement, I imagine it we would only implement it for simple equalities and inequalities expressed using $expr. I believe it will require work to make sure that the plans generated by sbe_stage_builder_expression.cpp refer to runtime environment slots that can be rebound rather than inlining constants into the plan.



 Comments   
Comment by Brenda Rodriguez [ 06/Sep/22 ]

christopher.harris@mongodb.com

Generated at Thu Feb 08 06:12:37 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.