[SERVER-25120] aggregation requests generated field name from query Created: 18/Jul/16  Updated: 04/Apr/23  Resolved: 27/Mar/23

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: 3.3.9
Fix Version/s: 7.0.0-rc0

Type: Improvement Priority: Major - P3
Reporter: Asya Kamsky Assignee: Alyssa Clark
Resolution: Done Votes: 0
Labels: asya, performance
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Problem/Incident
Related
is related to SERVER-72020 [CQF] Make sure computed fields are n... Backlog
is related to SERVER-13703 Presence of extraneous $project cause... Backlog
Assigned Teams:
Query Optimization
Backwards Compatibility: Fully Compatible
Sprint: Query 2017-03-27, Query 2017-04-17, QO 2023-03-20
Participants:
Linked BF Score: 151

 Description   

Aggregation pipeline sends down to query fields to return which are generated later in the pipeline (as result of "as" in $lookup).

Reproduce:

db.companies.explain().aggregate({$match:{_id:"14"}},{$graphLookup:{from:"companies",startWith:"$_id",connectFromField:"_id",connectToField:"parent",as:"tree",maxDepth:5, depthField:"order"}},{$group:{"_id":"$tree.country"}})
{
	"waitedMS" : NumberLong(0),
	"stages" : [
		{
			"$cursor" : {
				"query" : {
					"_id" : "14"
				},
				"fields" : {
					"tree.country" : 1,
					"_id" : 1
				},
				"queryPlanner" : {
					"plannerVersion" : 1,
					"namespace" : "d.companies",
					"indexFilterSet" : false,
					"parsedQuery" : {
						"_id" : {
							"$eq" : "14"
						}
					},
					"winningPlan" : {
						"stage" : "PROJECTION",
						"transformBy" : {
							"tree.country" : 1,
							"_id" : 1
						},
						"inputStage" : {
							"stage" : "IDHACK"
						}
					},
					"rejectedPlans" : [ ]
				}
			}
		},
		{
			"$graphLookup" : {
				"from" : "companies",
				"as" : "tree",
				"connectToField" : "parent",
				"connectFromField" : "_id",
				"startWith" : "$_id",
				"depthField" : "order",
				"maxDepth" : NumberLong(5)
			}
		},
		{
			"$group" : {
				"_id" : "$tree.country"
			}
		}
	],
	"ok" : 1
}

Note "tree.country" which is coming from the lookup and not original document.

It appears that if project or group needs a field after lookup, there is no filtering out of fields that start with string passed as "as:" to $lookup and it ends up being pushed down into the query.



 Comments   
Comment by Githook User [ 27/Mar/23 ]

Author:

{'name': 'Alyssa Wagenmaker', 'email': 'alyssa.wagenmaker@mongodb.com', 'username': 'awagenmaker'}

Message: SERVER-25120 Exclude generated paths from pipeline dependencies
Branch: master
https://github.com/mongodb/mongo/commit/fa131949d2ca5b3d79aa73783b03f5ad6895aff4

Comment by Asya Kamsky [ 04/Jan/17 ]

Turns out there is a better workaround for this for $lookup, and that's to use the existing localField that's already being fetched as the value for "as" (this only works because localField will have the same value as foreignField, so you don't lose any information).

> db.companies.explain().aggregate({$match:{_id:"14"}},{$lookup:{from:"companies",localField:"_id",foreignField:"parent",as:"newfield"}},{$unwind:"$newfield"}, {$group:{_id:null, fields:{$sum:"$newfield.count"}}})
{
	"stages" : [
		{
			"$cursor" : {
				"query" : {
					"_id" : "14"
				},
				"fields" : {
					"newfield.count" : 1,
					"_id" : 1
				},
				"queryPlanner" : {
					"plannerVersion" : 1,
					"namespace" : "dun.companies",
					"indexFilterSet" : false,
					"parsedQuery" : {
						"_id" : {
							"$eq" : "14"
						}
					},
					"winningPlan" : {
						"stage" : "PROJECTION",
						"transformBy" : {
							"newfield.count" : 1,
							"_id" : 1
						},
						"inputStage" : {
							"stage" : "IDHACK"
						}
					},
					"rejectedPlans" : [ ]
				}
			}
		},
		{
			"$lookup" : {
				"from" : "companies",
				"as" : "newfield",
				"localField" : "_id",
				"foreignField" : "parent",
				"unwinding" : {
					"preserveNullAndEmptyArrays" : false
				}
			}
		},
		{
			"$group" : {
				"_id" : {
					"$const" : null
				},
				"fields" : {
					"$sum" : "$newfield.count"
				}
			}
		}
	],
	"ok" : 1
}
// same thing but "as" is set to "_id" instead of "newfield"
> db.companies.explain().aggregate({$match:{_id:"14"}},{$lookup:{from:"companies",localField:"_id",foreignField:"parent",as:"_id"}},{$unwind:"$_id"}, {$group:{_id:null, fields:{$sum:"$_id.count"}}})
{
	"stages" : [
		{
			"$cursor" : {
				"query" : {
					"_id" : "14"
				},
				"fields" : {
					"_id" : 1
				},
				"queryPlanner" : {
					"plannerVersion" : 1,
					"namespace" : "dun.companies",
					"indexFilterSet" : false,
					"parsedQuery" : {
						"_id" : {
							"$eq" : "14"
						}
					},
					"winningPlan" : {
						"stage" : "PROJECTION",
						"transformBy" : {
							"_id" : 1
						},
						"inputStage" : {
							"stage" : "IDHACK"
						}
					},
					"rejectedPlans" : [ ]
				}
			}
		},
		{
			"$lookup" : {
				"from" : "companies",
				"as" : "_id",
				"localField" : "_id",
				"foreignField" : "parent",
				"unwinding" : {
					"preserveNullAndEmptyArrays" : false
				}
			}
		},
		{
			"$group" : {
				"_id" : {
					"$const" : null
				},
				"fields" : {
					"$sum" : "$_id.count"
				}
			}
		}
	],
	"ok" : 1
}

Credit for this workaround to this comment.

This workaround may not work the same way in graphLookup (depending on whether you need original/starting connectFromField later in the pipeline).

Comment by Asya Kamsky [ 11/Nov/16 ]

There is a work-around to prevent this from happening and that is to add an explicit $project stage before the lookup:

db.companies.explain().aggregate({$match:{_id:"14"}},{$project:{_id:1}},{$graphLookup:{from:"companies",startWith:"$_id",connectFromField:"_id",connectToField:"parent",as:"tree",maxDepth:5, depthField:"order"}},{$group:{"_id":"$tree.country"}})
{
	"stages" : [
		{
			"$cursor" : {
				"query" : {
					"_id" : "14"
				},
				"fields" : {
					"_id" : 1
				},
				"queryPlanner" : {
					"plannerVersion" : 1,
					"namespace" : "test.companies",
					"indexFilterSet" : false,
					"parsedQuery" : {
						"_id" : {
							"$eq" : "14"
						}
					},
					"winningPlan" : {
						"stage" : "EOF"
					},
					"rejectedPlans" : [ ]
				}
			}
		},
		{
			"$project" : {
				"_id" : true
			}
		},
		{
			"$graphLookup" : {
				"from" : "companies",
				"as" : "tree",
				"connectToField" : "parent",
				"connectFromField" : "_id",
				"startWith" : "$_id",
				"depthField" : "order",
				"maxDepth" : NumberLong(5)
			}
		},
		{
			"$group" : {
				"_id" : "$tree.country"
			}
		}
	],
	"ok" : 1
}

Explicit $project is honored over dependency analysis.

Comment by Asya Kamsky [ 28/Aug/16 ]

We do correctly filter out new fields added in $project stage (that are $literal or $const) though.

db.coll1.explain().aggregate({$match:{_id:1}},{$project:{_id:1,a:"abc"}})
{
	"stages" : [
		{
			"$cursor" : {
				"query" : {
					"_id" : 1
				},
				"fields" : {
					"_id" : 1
				},
				"queryPlanner" : {
					"plannerVersion" : 1,
					"namespace" : "views.coll1",
					"indexFilterSet" : false,
					"parsedQuery" : {
						"_id" : {
							"$eq" : 1
						}
					},
					"winningPlan" : {
						"stage" : "PROJECTION",
						"transformBy" : {
							"_id" : 1
						},
						"inputStage" : {
							"stage" : "IDHACK"
						}
					},
					"rejectedPlans" : [ ]
				}
			}
		},
		{
			"$project" : {
				"_id" : true,
				"a" : {
					"$const" : "abc"
				}
			}
		}
	],
	"ok" : 1
}

Comment by Charlie Swanson [ 18/Jul/16 ]

This looks like a legitimate bug. Our dependency tracking system has no special logic to handle cases where new fields get added to documents, so it's not surprising that it thinks injected fields need to come from the input collection. I'm moving this to "Needs Triage" so we can look at it as a team.

Generated at Thu Feb 08 04:08:19 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.