[SERVER-32494] $lookup should have option not merge duplicate array item in result (not auto unique) Created: 31/Dec/17  Updated: 27/Oct/23  Resolved: 02/Jan/18

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Jame Yang Assignee: Backlog - Query Team (Inactive)
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-22881 lookup between local (multiple)array... Closed
Assigned Teams:
Query
Backwards Compatibility: Fully Compatible
Participants:

 Description   

Think below $lookup aggregate:

use test
db.test.insert([{id:1, x:10}, {id:2, y:11}])
db.test.insert({id:3, a:[{id:1}, {id:2}, {id:1}]})
db.test.aggregate([ 
  {$match:{id:3}}, 
  {$lookup: {from:'test', localField:'a.id', foreignField:'id', as:'xyz'}},  
  {$project:{xyz:1}}
])

The result have merged result with array length=2, but the source array a has length=3

{"xyz" : [ { "id" : 1, "x" : 10}, { "id" : 2, "y" : 11} ] }

The above result can cause issue in client application since if the id is same, the result will merged, different length from source, but if all the source id is unique, they have identical length.

The problem here is, it's related to input data, the application cannot predict whether the result is merged, and the 1-1 relation from source array to result array is broken!

Maybe can add an option to $lookup stage object say "unique: false" ? And not merge the result automatically.



 Comments   
Comment by Asya Kamsky [ 29/Jul/19 ]

For anyone who finds this ticket looking for the same workaround, this is a way you can do it:

db.test.aggregate([
    {$match:{id:3}},
    {$lookup: {from:'test', localField:'a.id', foreignField:'id', as:'xyz'}}, 
    {$project:{xyz:{
       $map:{
          input:"$a.id", 
          as:"a",
          in:{$filter:{
              input:"$xyz",
              cond:{$eq:["$$this.id","$$a"]}
          }}
       }
    }}}
]

This uses original array "a" to create the new array "xyz" with complete set of documents and same order.

Comment by Jame Yang [ 03/Jan/18 ]

Hi Nick,

Thank you for the reply. This example is very simple one, in real world, $unwind then $group again will lost other keys in documents. Like below:

db.test.insert({id:3, a:[{id:1}, {id:2}, {id:1}], b:2,c:3})
db.test.insert({id:3, a:[{id:2}, {id:1}, {id:1}], d:4,e:5})

If most of the documents have different keys, how to $group by _id? I've searched many and tried many before post this issue, I've tried all the methods, include your example solution, but after $group, you have to include all the possible keys like b,c,d,e etc, that impossible when documents have different keys and some times cannot predict.

Comment by Nicholas Zolnierz [ 02/Jan/18 ]

Hi futurist,

The behavior you describe works as designed since the equality match is translated to

foreignField: { $in: [ localField.elem1, localField.elem2, ... ] }

when the localField is an array, which will only return a single document for the id: 1 match. There's some additional discussion on SERVER-22881 which changed the way $lookup handles local array values.

Please also take a look at this example on how to workaround this. In your case, one way to do this is to add an {$unwind: "$a"} before the $lookup stage to create entries in the result array for each matched element from your local array:

db.test.aggregate([
     {$match: {id: 3}}, 
     {$unwind: "$a"}, 
     {$lookup: {from: 'test', localField: 'a.id', foreignField: 'id', as: 'xyz'}}, 
     {$unwind: "$xyz"}, 
     {$group: {_id: "$_id", xyz: {$push: "$xyz"}}}
])

Regards,
Nick

Comment by Kelsey Schubert [ 31/Dec/17 ]

Hi futurist,

Thank you for the suggestion. I've sent this ticket to the Query Team for consideration; please continue to watch for updates.

Kind regards,
Kelsey

Comment by Jame Yang [ 31/Dec/17 ]

Additional Info:

And with this new options (unique:false?), if 1-1 relation can be created, the order of result array should also keep consistency with source array (here is a").

I've also very struggled with the order of result array after $lookup, the order of array item is unknown/uncertain even the lengths are the same! The unique:false? option should also make the source and result array the real 1-1 relation, the length, and the order.

Generated at Thu Feb 08 04:30:24 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.