[SERVER-42306] $lookup with an array of sub objects should allow for an option to allow each sub object to find join partners Created: 19/Jul/19  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Mert Hasret Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 4
Labels: query-44-grooming
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-42307 $lookup overwrites the entire object ... Closed
Related
related to SERVER-28717 $lookup does not expand 'localField' ... Closed
Assigned Teams:
Query Optimization
Participants:

 Description   
Schema

x:\{type: String, required: true},
 
a: [{
 
     _b: \{type: Schema.Types.ObjectId, ref: "B", required: true}, 
 
      c: \{type: Number, required: true},
 
     _d: \{type: Schema.Types.ObjectId, ref: "D", required: true}, 
 
      e: { 
 
              f: \{type: Number, required: true}
 
        }     
 
  }]

 

 

Lookup Stage

    {
        "$lookup": {
              from: "B",
              localField: "a._b",
              foreignField: "_id",
              as: "a._b"
          }
    }

Expected Output

    {
        x:/.../,
        a:[{
            _b:[Object],
            c:/.../,
            _d:/.../,
            e:{
                f:/.../
            }
        }]
    }

Output

    {
        x:/.../,
        a: {
 
            _b: [Object]
 
        }
    }

 

 

As you can see above, fields inside the "a" gets overwritten by the $lookup operation, and the field called "a" turns into an object after the aggregation (just like in the behavior of the $unwind operator).

 

 

 



 Comments   
Comment by Vladimir N/A [ 12/Jan/21 ]

@Asya Kamsky Thank you for the workaround. It works fine for me in case of an array of sub objects. But I can't realize how to construct the similar query for array of sub arrays. I mean the case like:

 

db.a.drop();
db.props.drop();
db.a.insert([  
  { "a": [
    { "_id": 1, "props": [
      {prop: 1, extra: 1},
      {prop: 2, extra: 2},
    ] },
    { "_id": 2, "props": [
      {prop: 2, extra: 3},
      {prop: 3, extra: 4},
    ]}
  ]}
]);  
 
db.props.insert([
  { "_id": 1, d:"detail 1"},
  { "_id": 2, d:"detail 2"},
  { "_id": 3, d:"detail 3" }
]);

So, I need to populate a.props.prop. How can I do that?

 

Comment by Asya Kamsky [ 29/Jul/19 ]

Note that behavior when comparing array and non-array with localField/foreignField is frequently unexpected/confusing.

I would recommend we group those in a single project to clarify (and potentially enhance) array behavior in localField/foreignField syntax.

Reusing example from SERVER-28717:

db.a.drop();
db.b.drop();
db.a.insert([
  { "a": [{ "_id": 1, "prop": 2 },{ "_id": 2, "prop": 3 }] }
])
  
db.b.insert([
   { "_id": 1 , d:"detail 1"},
   { "_id": 2 , d:"detail 2"},
   { "_id": 3, d:"detail 3" }
])

There is no way to join information from "b" into array "a" in collection a in a $lookup - it has to be done as a separate step. Worse, as described in this ticket, specifying "as" field as anything starting with "a" as top level unexpectedly removes existing information.

db.a.aggregate([   { "$lookup": {     "from": "b",     "localField": "a._id",     "foreignField": "_id",     "as": "a.details"     }} ])
{ "_id" : ObjectId("5d3f32e0228d30621a762197"), "a" : { "details" : [ { "_id" : 1, "d" : "detail 1" }, { "_id" : 2, "d" : "detail 2" } ] } }

Instead of "a" staying an array with new "details" field, we get new "a.details" array.

The only workaround for this at the moment is to use a different "as" field and then "merge" the original and new arrays.

db.a.aggregate([   { "$lookup": {     "from": "b",     "localField": "a._id",     "foreignField": "_id",     "as": "details"     }}, {$addFields:{fullDetails:{$map:{input:"$a", as:"a", in:{$mergeObjects:["$$a", {details:{$filter:{input:"$details",cond:{$eq:["$$a._id","$$this._id"]}}}}]}}}}} ]).pretty()
{
	"_id" : ObjectId("5d3f32e0228d30621a762197"),
	"a" : [
		{
			"_id" : 1,
			"prop" : 2
		},
		{
			"_id" : 2,
			"prop" : 3
		}
	],
	"details" : [
		{
			"_id" : 1,
			"d" : "detail 1"
		},
		{
			"_id" : 2,
			"d" : "detail 2"
		}
	],
	"fullDetails" : [
		{
			"_id" : 1,
			"prop" : 2,
			"details" : [
				{
					"_id" : 1,
					"d" : "detail 1"
				}
			]
		},
		{
			"_id" : 2,
			"prop" : 3,
			"details" : [
				{
					"_id" : 2,
					"d" : "detail 2"
				}
			]
		}
	]
}

Note that I assumed that there could be multiple matches (in this case matching on _id means there can't be, but in another case that may not hold).

Comment by Asya Kamsky [ 29/Jul/19 ]

According to comments in SERVER-28717 this should have been filed as a follow-on ticket from that case. This comment describes the issue and this comment and two later is the response.

I didn't see that the ticket was ever filed, however, so I think this ticket can be used to track already known limitation as an new functionality.

Comment by Danny Hatcher (Inactive) [ 24/Jul/19 ]

Thank you for the report. I've passed this on to our Query team to take a look.

Generated at Thu Feb 08 05:00:08 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.