[SERVER-51303] Lookup stage followed by $match on type uses wrong field in matching condition Created: 02/Oct/20  Updated: 29/Oct/23  Resolved: 10/Nov/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.2.9, 4.4.1, 3.6.20, 4.0.21
Fix Version/s: 4.8.1, 4.4.2, 4.2.11, 3.6.21, 4.0.22

Type: Bug Priority: Blocker - P1
Reporter: Thomas Rueckstiess Assignee: Jacob Evans
Resolution: Fixed Votes: 0
Labels: qopt-team
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
is related to SERVER-1475 {field: {$type:"array"}} should retur... Closed
is related to SERVER-21612 Combine post $lookup $match on looked... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4, v4.2, v4.0, v3.6
Steps To Reproduce:

See below.

Sprint: Query 2020-10-19, Query 2020-11-02, Query 2020-11-16
Participants:

 Description   
Issue Status as of Nov 12, 2020

ISSUE DESCRIPTION AND IMPACT

Matching by $type on a looked up field after a $lookup stage in an aggregation operation leads to incorrect results, because the $match on $type is pushed into the $lookup without removing the reference to the new field created by $lookup. This failure is unique to the $type operator.

For example, the following operation matches on "lookedup.name" after a $lookup into the "lookedup" field:

[
  {
    $lookup: {
      from: "second",
      localField: "field",
      foreignField: "foreignField",
      as: "lookedup",
    },
  },
  { $unwind: "$lookedup" },
  { $match: { "lookedup.name": { $type: "string" } } },
]

MongoDB incorrectly optimizes the aggregation pipeline by attempting to match on "lookedup.name" in the "second" collection, where "lookedup.name" does not exist.

The correct behavior is for the optimized lookup stage to match on “name” instead.

DIAGNOSIS AND AFFECTED VERSIONS

All versions since 3.4.0 are affected by this issue.

Running the .explain() method will help identify the incorrect matching stage.

db.first.explain().aggregate([
  {
    $lookup: {
      from: "second",
      localField: "field",
      foreignField: "foreignField",
      as: "lookedup",
    },
  },
  { $unwind: "$lookedup" },
  { $match: { "lookedup.name": { $type: "string" } } },
]);

If the query optimizer moves the $match stage into the $lookup stage while still referencing the field created by $lookup, the operation is impacted. For example:

{
  $lookup: {
    from: "second",
    as: "lookedup",
    localField: "field",
    foreignField: "foreignField",
    unwinding: { preserveNullAndEmptyArrays: false },
    matching: { "lookedup.name": { $type: [2] } },
  },
};

where

matching: {“lookedup.name”: { $type: [2] } }

should be

matching: {“name”: { $type: [2] } }

REMEDIATION AND WORKAROUNDS
This fix will be included in 4.4.2, 4.2.11, 4.0.22, 3.6.21. To work around this issue, you can add a new field "newField" that represents "lookedup.name", run the match on "newField", then unset "newField".

db.first.aggregate([
  {
    $lookup: {
      from: "second",
      localField: "field",
      foreignField: "foreignField",
      as: "lookedup",
    },
  },
  { $unwind: "$lookedup" },
  { $addFields: { newField: "$lookedup.name" }},
  { $match: { "newField": { $type: "string" } } },
  { $unset: "newField" }
]);

Original Description

A $lookup stage followed by a $match on the type of a looked up field does not work. It seems to be using the wrong field name when the $match gets pushed inside the $lookup stage.

Steps to Reproduce

1. Create two collections as follows

  • collection "first" contains: { "field" : "value" }
  • collection "second" contains: { "foreignField" : "value", "name" : "Thomas" }

2. Run the following aggregation on collection "first" 

db.first.aggregate(
  [
    {$lookup: {
      from: "second", 
      localField: "field", 
      foreignField: "foreignField", 
      as: "lookedup"}
    }, 
    {$unwind: "$lookedup"}, 
    {$match: {"lookedup.name": {$type: "string"}}}
]); 

3. In the explain, you can see that the $match gets pushed into the lookup stage but uses the original field name "lookedup.name" which doesn't exist in the remote collection.

 { { "$lookup" : { "from" : "second", "as" : "lookedup", "localField" : "field", "foreignField" : "foreignField", "unwinding" : { "preserveNullAndEmptyArrays" : false }, "matching" : { "lookedup.name" : { "$type" : [ 2 ] } } } } 

If a non-$type query is used (e.g. equality match), the field instead is just "name", which is the correct behavior. 

{{ "$lookup" : { "from" : "second", "as" : "lookedup", "localField" : "field", "foreignField" : "foreignField", "unwinding" : { "preserveNullAndEmptyArrays" : false }, "matching" : { "name" : { "$eq" : "Thomas" } } } } 

Expected Results

Aggregation returns the document because the field lookedup.name is a string

Actual Results

Aggregation returns no documents.

 

 

Hat tip to nathan.smyth who discovered this bug.



 Comments   
Comment by Githook User [ 10/Nov/20 ]

Author:

{'name': 'Jacob Evans', 'email': 'jacob.evans@10gen.com'}

Message: SERVER-51303 Fix lookup match absorbtion optimization for $type
Branch: v4.2
https://github.com/mongodb/mongo/commit/ea38428f0c6742c7c2c7f677e73d79e17a2aab96

Comment by Githook User [ 10/Nov/20 ]

Author:

{'name': 'Jacob Evans', 'email': 'jacob.evans@10gen.com'}

Message: SERVER-51303 Fix lookup match absorbtion optimization for $type
Branch: v4.0
https://github.com/mongodb/mongo/commit/7d9b94d035708516ef4a3f1fc0376543138df090

Comment by Githook User [ 10/Nov/20 ]

Author:

{'name': 'Jacob Evans', 'email': 'jacob.evans@10gen.com'}

Message: SERVER-51303 Fix lookup match absorbtion optimization for $type
Branch: master
https://github.com/mongodb/mongo/commit/1ec7eeba23b21952b01050770c6ceb46ac1789f6

Comment by Githook User [ 10/Nov/20 ]

Author:

{'name': 'Jacob Evans', 'email': 'jacob.evans@10gen.com'}

Message: SERVER-51303 Fix lookup match absorbtion optimization for $type
Branch: v3.6
https://github.com/mongodb/mongo/commit/40a9a11992ac554abfd5950371d3d48808b603e2

Comment by Githook User [ 10/Nov/20 ]

Author:

{'name': 'Jacob Evans', 'email': 'jacob.evans@10gen.com'}

Message: SERVER-51303 Fix lookup match absorbtion optimization for $type
Branch: v4.4
https://github.com/mongodb/mongo/commit/15e73dc5738d2278b688f8929aee605fe4279b0e

Comment by David Storch [ 04/Nov/20 ]

I did a bit of digging into the origin of this bug, and it appears that it was introduced with SERVER-21612, which added the initial implementation of the optimization that pushes down $match into the inner branch of $lookup for [$lookup, $unwind, $match] pipelines. In particular, the problematic check for a $type match expression was first added here. Therefore, this bug affects all stable versions since 3.4.0. I've updated the "Affects Versions" field accordingly.

If my archaeology in the git history is correct, this check was not introduced by following some subtle line of reasoning around the semantics of the $type match expression, but rather was added for a more pedestrian reason. The original code from SERVER-21612 attempted to find path-accepting match expressions whose path must be modified as part of the rewrite which pushes down the $match into the $lookup. It did so by calling MatchExpression::isLeaf() and then static_casting to a LeafMatchExpression. However, at the time TypeMatchExpression did not inherit from LeafMatchExpression, but instead inherited directly from the MatchExpression base class. Therefore, the static_cast would be illegal for TypeMatchExpression, and hence $type was erroneously excluded by the code which modifies the match expression paths. The TypeMatchExpression was later changed to inherit from LeafMatchExpression as part of SERVER-1475 as expected, so the check in the $lookup-$match code for $type appears to be completely unnecessary in recent versions.

Generated at Thu Feb 08 05:25:06 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.