-
Type:
Improvement
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Query Optimization
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Imagine an MQL pipeline like the following:
[
{$project: {a: "$b.c"}},
{$match: {a: {$eq: 42}}}
]
At first glance, the $project may appear to be a renaming operation which renames a field "b.c" inside a nested object to a top-level field "a". This in turn would imply that the following $match pushdown optimization is legal:
[ {$match: {"b.c": {$eq: 42}}}, {$project: {a: "$b.c"}} ]
Unfortunately, due to the semantics of MQL this rewrite is not always legal. Imagine that the collection contains the following data:
> db.c.find() { "_id" : ObjectId("68a8d48990da450ab755a037"), "b" : { "c" : 42 } } { "_id" : ObjectId("68a8d48c90da450ab755a039"), "b" : { "c" : 43 } } { "_id" : ObjectId("68a8d4d490da450ab755a03b"), "b" : [ { "c" : [ 42 ] } ] }
Now let's run the original query and the rewritten one:
// Before the rewrite > db.c.aggregate([ ... {$project: {a: "$b.c"}}, ... {$match: {a: {$eq: 42}}} ... ]) { "_id" : ObjectId("68a8d48990da450ab755a037"), "a" : 42 } // After the rewrite > db.c.aggregate([ ... {$match: {"b.c": {$eq: 42}}}, ... {$project: {a: "$b.c"}} ... ]) { "_id" : ObjectId("68a8d48990da450ab755a037"), "a" : 42 } { "_id" : ObjectId("68a8d4d490da450ab755a03b"), "a" : [ [ 42 ] ] }
As you can see above, the two versions of the query agree with respect to the scalar data (where no arrays are involved), but the rewrite is incorrect when the data contains arrays. In short, projection operations involving dotted paths are not renames but rather they are restructuring operations.
In our implementation, these rename-like operations involving dotted paths are referred to as "complex renames" and have some special handling. The rewrite described above is not currently implemented in the system because it is only correct in the absence of arrays. This ticket observes that if the system can prove the absence of arrays, then a "complex rename" can be treated just like a regular rename and the $match swapping rewrite can be performed! Today, the absence of arrays can be proved through the multikeyness metadata tracked alongside indexes, but in the future this could be derived from other sources such as schema validators.
Note that this specific situation is common in the context of relational migrations. Expressing join operations in MQL involves $lookup or $lookup-$unwind pairs. The semantics of $lookup-$unwind are to nest the results from the inner side inside a subobject (it's a "nest join"). The application may want the data returned in a flattened tabular format, which requires lifting fields from the nested object to the top-level. If the application wants to perform some filtering after the join operation, then we get into the exact situation described here:
db.A.aggregate([
{$lookup: {
from: “B”,
localField: “a”,
foreignField: “b”,
as: “B”}
},
{$unwind: “B”},
{$project: {liftedField: "$B.liftedField"}},
{$match: {liftedField: {$eq: 42}}},
]);
The solution for this ticket should cover complex renames in both $project and $addFields. It need not make any improvements around $replaceRoot.
- related to
-
SERVER-110254 Support swapping $match before "complex renames" when a flag says there are no arrays
-
- Closed
-