[SERVER-43943] $lookup with array local field misbehaves in pipeline Created: 10/Oct/19  Updated: 27/Oct/23  Resolved: 22/Oct/19

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: 4.0.4
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Adam Rackis Assignee: David Storch
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Steps To Reproduce:

db.classes.insert([
  {
    _id: 1.0,
    title: "Reading is ...",
    enrollmentlist: ["giraffe2", "pandabear", "artie"],
    days: ["M", "W", "F"]
  },
  {
    _id: 2.0,
    title: "But Writing ...",
    enrollmentlist: ["giraffe1", "artie"],
    days: ["T", "F"]
  },
  {
    _id: 3.0,
    title: "Math",
    enrollmentlist: ["giraffe1", "artie"],
    days: ["T", "F"]
  },
  {
    _id: 4.0,
    title: "Science",
    enrollmentlist: ["giraffe1", "artie"],
    days: ["T", "F"]
  }
]);

 

 
db.members.insert([
  {
    _id: 1.0,
    name: "artie",
    names: ["giraffe2", "pandabear", "artie"],
    joined: ISODate("2016-05-01T00:00:00.000Z"),
    status: "A"
  },
  {
    _id: 2.0,
    name: "giraffe",
    names: ["giraffe", "yyy"],
    joined: ISODate("2017-05-01T00:00:00.000Z"),
    status: "D"
  },
  {
    _id: 3.0,
    name: "giraffe1",
    names: ["giraffe1", 33],
    joined: ISODate("2017-10-01T00:00:00.000Z"),
    status: "A"
  },
  {
    _id: 4.0,
    name: "panda",
    names: ["panda", "r"],
    joined: ISODate("2018-10-11T00:00:00.000Z"),
    status: "A"
  },
  {
    _id: 5.0,
    name: "pandabear",
    names: ["pandabear", 123],
    joined: ISODate("2018-12-01T00:00:00.000Z"),
    status: "A"
  },
  {
    _id: 6.0,
    name: "giraffe2",
    names: ["giraffe2", "www"],
    joined: ISODate("2018-12-01T00:00:00.000Z"),
    status: "D"
  }
]);

Then observe that

 
db.members.aggregate([
  {
    $lookup: {
      from: "classes",
      let: { names: "$names" }, // ----> $names is an array
      pipeline: [{ $match: { $expr: { $in: ["$$names", "$enrollmentlist"] } } }],
      as: "classes_info"
    }
  }
]);

 
returns empty join results, while

 
db.members.aggregate([
  {
    $lookup: {
      from: "classes",
      let: { names: "$name" },  // ----> $name is a string
      pipeline: [{ $match: { $expr: { $in: ["$$names", "$enrollmentlist"] } } }],
      as: "classes_info"
    }
  }
]);

 
does not. It seems arrays are not represented correctly in the pipeline stage of a $lookup
 
I'm on version 4.0.4 - apologies if this was fixed in a later version - I can't find any references to it, if so.

Sprint: Query 2019-11-04
Participants:

 Comments   
Comment by Adam Rackis [ 23/Oct/19 ]

Wow - that's a stupendous explanation. Thank you so much! 

Comment by David Storch [ 23/Oct/19 ]

adamrackis@hotmail.com, there are two different $in expressions which appear in different context in the grammar of the MongoDB Query Language. Although these two expressions have the same spelling, they are conceptually distinct, and are not expected to have the same syntax or semantics. Your example above uses the $in match expression: see https://docs.mongodb.com/manual/reference/operator/query/in/. It's syntax is

{field: {$in: [val_1, ..., val_n]}}

Its meaning is that the document matches if "field" is equal to val_1 OR val_2 OR ... OR val_n. The notion of equality here involves the expansion of arrays. So if field contains an array with any of the values 1 through n, then the document also matches.

In contrast, the query given in "Steps to Reproduce" uses an $in inside the $expr construct, which means that you're getting the $in agg expression: see https://docs.mongodb.com/manual/reference/operator/aggregation/in/. The syntax and semantics of the $in agg expression are slightly different than that of the match expression. Syntactically, it looks like this:

{$in: [<exp1>, <exp2>]}

Here, <exp2> must evaluate to an array. The expression returns a boolean which indicates whether the value returned by <exp1> is an element of the array returned by <exp2>. For example,

{$in: [3, [1, 2]]} => returns false
{$in: [3, [1, 2, 3]]} => returns true
{$in: [3, [1, 2, {foo: 3}]]} => returns false
{$in: [3, [1, 2, [3, 4]]]} => returns false

You'll notice that in this context, there is no implicit expansion of arrays – implicit array traversal behavior typically applies only to match expressions, not to agg expressions. Does that help to clarify?

Comment by Adam Rackis [ 23/Oct/19 ]

@asya - thanks a ton for the info. The thing I'm not quite understanding, though, is why this query

db.getCollection('type1').aggregate([{$match: { values: { $in: [2] }}}]);

returns this document

{
 "_id" : ObjectId("5dafb35e203fc944d8dd4e44"),
 "name" : "a",
 "values" : [ 1, 2, 3, 4]
}

It seems when $in is used in this way, the array of values we're checking for is flattened out, and each checked individually. But when I use $in inside of $expr, like in the original code, I see the different behavior you describe.

Could I trouble you to help me understand what exactly the difference is?

 

 

Comment by David Storch [ 22/Oct/19 ]

adamrackis@hotmail.com, per Asya's comment above, I am closing this ticket as "Works as Designed". Let me know if you have any further questions!

Comment by Asya Kamsky [ 21/Oct/19 ]

> It seems arrays are not represented correctly in the pipeline stage of a $lookup

adamrackis@hotmail.com, the pipeline is working correctly in this case. Here is what happens:

Inside $expr the aggregation semantics apply and the $in expression checks if the first argument ("$$name" which is an array in your first case) is present as an element in the second argument (an array). The expression correctly evaluates to false and therefore you get an empty join set.

If you wanted to check if the two arrays have any common elements in aggregation, you can use $setIntersection and compare the result to empty array:

db.members.aggregate([
   {
     $lookup: {
       from: "classes",
       let: { names: "$names" },
       pipeline: [{ $match: { $expr: {$ne:[ [], { $setIntersection: ["$$names", "$enrollmentlist"] }]} } }],
       as: "classes_info"
     }
   }
])

If you were translating these examples from simple $lookup with localField/foreignField, note that regular find/$match semantics are used to compare local field and foreign field, which means local field is either equal to foreign field or is equal to an array element of foreign field. In addition, in this case localField as an array is treated as separate elements for the purpose of checking this equality, rather than as an array literal, which may be adding to some confusion here.

Comment by Adam Rackis [ 11/Oct/19 ]

Awesome, thanks! And thanks for cleaning up the formatting on my ticket. Sorry about that! 

Comment by Carl Champain (Inactive) [ 11/Oct/19 ]

Hi adamrackis@hotmail.com,

Thanks for the report.
I was able to successfully test your example. It does seem that if let is an array, then its values are not matched against the other arrays in the $in stage. I'll pass this ticket along to the Query team for additional investigation of this issue. Please watch this ticket for updates.

Kind regards,
Carl

Comment by Adam Rackis [ 10/Oct/19 ]

Apologies for the crappy formatting. I can't seem to find an edit button to try to fix :-| 

Generated at Thu Feb 08 05:04:32 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.