[SERVER-40362] expressive $lookup "let" with missing field cannot be optimized Created: 27/Mar/19  Updated: 23/Jan/24

Status: Backlog
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Asya Kamsky Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 2
Labels: query-44-grooming, storch
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-42738 Slow $lookup on $expr match with null... Closed
Related
related to SERVER-34927 allow localField and foreignField wit... Closed
Assigned Teams:
Query Optimization
Participants:
Case:

 Description   

*EDIT* Note that in aggregation expressions null and missing are not equal (i.e $eq of two of them will not return true) so technically this works "correctly" according to aggregation expression semantics, but it's not consistent with $lookup with localField/foreignField which uses find/match equality semantics.

Discovered when debugging expressive lookup not using _id index when looking up based on non-existent field:

{$lookup:{
    from:"coll",as:"x", 
    let:{i:"$foobar"},
    pipeline:[{$match:{$expr:{$eq:["$_id","$$i"]}}}]
}}

Because evaluation of "$foobar" in let doesn't resolve it to a constant expression, the inner pipeline cannot rewrite $expr and doesn't use an index.

Reported on 3.6, tested and reproduced on 4.1.9.



 Comments   
Comment by Ian Boros [ 26/May/20 ]

asya I think if we had a special way of representing missing in an index/BSON, this would be more straightforward. Any special logic about using INEXACT_FETCH bounds in my patch above (which was necessary to disambiguate 'null' values from 'missing' values) would no longer be necessary.

That said, this entire system of converting aggregation $eq expressions into an internal expression in the find layer is already kind of a hack. I'm guessing we would be willing to do this when we can distinguish null and missing in indexes but I cannot say for sure. Some of the details about how to do this would depend on the design of "Distinguish Null/Missing in Indexes."

Comment by Asya Kamsky [ 22/May/20 ]

ian.boros:

Will this be simpler to address (in a non-hacky way) if our indexes represent null and missing differently? (i.e. we will already have to add some missing representation?)

Comment by Ian Boros [ 18/Apr/19 ]

Here's the deal:

In the example provided, "$$i" evaluates to "$$REMOVE" or the 'missing' value. We explicitly don't use $internalExprEq for that case, which is why an index isn't used:

db.c.createIndex({b: 1})
db.c.insert({a: 1})
db.c.find({$expr:{$eq:["$b","$$REMOVE"]}}).explain()
// COLLSCAN!

If we wanted to use an index for $eq with 'missing' values, we would need special logic in the planner to create an INEXACT_FETCH plan with bounds of [null, null].

Since we don't have a way of representing a 'missing' value in BSON (our internal representation has BSONType::EOO, but this is not part of the BSON spec, and so cannot be serialized for things like explain), this might get a bit hairy. One approach I can think of would be to modify InternalExpressionEq to have a special "isCheckForMissing" flag. The planner could then check this flag and set the tightness appropriately. Another way would be to just introduce another internal expression. Obviously both of these solutions are not good from a maintainability point of view, so it'd be worth trying to think of others.

Also, note that the $eq agg expression does not treat null equal to missing. e.g.

In other words, for the example given,

{$eq:["$_id","$$i"]}

is not the same as

{$eq:["$_id", null]}

Comment by Asya Kamsky [ 28/Mar/19 ]

A workaround can be:

let:{i:{$ifNull:["$foobar",null]}},

Comment by Asya Kamsky [ 27/Mar/19 ]

Logs show

2019-03-27T20:16:59.977+0000 D QUERY    [conn325] Expression prior to rewrite: {$eq: ["$_id", "$$REMOVE"]}

Generated at Thu Feb 08 04:54:45 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.