[SERVER-26066] $project should add top-level field to dependencies for nested computed fields Created: 12/Sep/16  Updated: 12/Dec/19  Resolved: 12/Dec/19

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: 4.3.3

Type: Bug Priority: Major - P3
Reporter: Charlie Swanson Assignee: Ian Boros
Resolution: Done Votes: 0
Labels: qexec-team, query-44-grooming
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-30812 When using an array element as the lo... Closed
Backwards Compatibility: Minor Change
Operating System: ALL
Sprint: Query 2019-12-02, Query 2019-12-16, Query 2019-12-30
Participants:

 Description   

The following example demonstrates an inconsistency within the $project stage:

> db.foo.insert({a: [1, {b: 2}, 3, {}]})  // Note 4 elements in 'a'.
WriteResult({ "nInserted" : 1 })
> db.foo.aggregate([{$project: {"a.b": {$literal: "NEW"}}}]).pretty()
{
	"_id" : ObjectId("57d6d3cd150e60d4d52d9714"),
	"a" : [  // Note there are still 4 elements in 'a'. scalar values have been replaced with documents.
		{
			"b" : "NEW"
		},
		{
			"b" : "NEW"
		},
		{
			"b" : "NEW"
		},
		{
			"b" : "NEW"
		}
	]
}
> db.foo.aggregate([{$project: {"a.b.c": {$literal: "NEW"}}}]).pretty()
{
	"_id" : ObjectId("57d6d3cd150e60d4d52d9714"),
	"a" : [  // Note there are only 2 values in 'a'.
		{
			"b" : {
				"c" : "NEW"
			}
		},
		{
			"b" : {
				"c" : "NEW"
			}
		}
	]
}

This happens because the second projection adds 'a.b' to its dependencies instead of just 'a', which loses the 'shape' of 'a':

> db.foo.explain().aggregate([{$project: {"a.b.c": {$literal: "NEW"}}}])
{
	"stages" : [
		{
			"$cursor" : {
				"query" : {
					
				},
				"fields" : {
					"a.b" : 1,
					"_id" : 1
				},
    ...  // Other explain info.
}
> db.foo.find({}, {_id: 1, "a.b": 1}).pretty()
{
	"_id" : ObjectId("57d6d443f5f2b41fe6bb267c"),
	"a" : [  // Note only 2 of 4 elements result from this projection.
		{
			"b" : 2
		},
		{
			
		}
	]
}



 Comments   
Comment by Githook User [ 12/Dec/19 ]

Author:

{'name': 'Ian Boros', 'email': 'ian.boros@mongodb.com', 'username': 'puppyofkosh'}

Message: SERVER-26066 Fix dependency analysis for projections with expressions on dotted fields
Branch: master
https://github.com/mongodb/mongo/commit/e699ae35a04c421398adb76002546da720c25673

Comment by Ian Boros [ 20/Nov/19 ]

Well unfortunately it turns out this was fixed by accident. The analysis hasn't changed.

> db.c.explain().aggregate([{$unwind: {path: "$b", preserveNullAndEmptyArrays: true}}, {$project: {"a.b.c": "new"}}])
{
	"stages" : [
		{
			"$cursor" : {
				"queryPlanner" : {
					"plannerVersion" : 1,
					"namespace" : "test.c",
					"indexFilterSet" : false,
					"parsedQuery" : {
						
					},
					"queryHash" : "51371230",
					"planCacheKey" : "51371230",
					"winningPlan" : {
						"stage" : "PROJECTION_DEFAULT",
						"transformBy" : {
							"_id" : 1,
							"a.b" : 1,
							"b" : 1
						},
						"inputStage" : {
							"stage" : "COLLSCAN",
							"direction" : "forward"
						}
					},
					"rejectedPlans" : [ ]
				}
			}
		},
		{
			"$unwind" : {
				"path" : "$b",
				"preserveNullAndEmptyArrays" : true
			}
		},
		{
			"$project" : {
				"_id" : true,
				"a" : {
					"b" : {
						"c" : {
							"$const" : "new"
						}
					}
				}
			}
		}
	],
	"serverInfo" : {
		"host" : "borosaurus",
		"port" : 30000,
		"version" : "0.0.0",
		"gitVersion" : "unknown"
	},
	"ok" : 1
}
 

That is, we only request "a.b" from the query layer.

However, now that we use the new projection executor, it will insert "MISSING" values into the 'a' array when performing the a.b: 1 projection. That is, the find layer produces a document like this:

{_id: 5dd5879a56a821798c22ea8a, a: [MISSING, {b: 2}, MISSING, {}]}

Then the $project stage applied to the array of length 4 produces the expected result. While I'm not sure there's an observable bug here anymore, I suspect that this would get even more complicated when we introduce the fast-path projection executor that operates over BSON.

I'm going to see what it would take to fix the analysis so that any expression on "a.b.c" requires 'a' in its entirety.

CC david.storch anton.korshunov

Comment by Ian Boros [ 12/Nov/19 ]

anton.korshunov Nice find! I'll add it to the epic. This one may just involve writing a test.

Generated at Thu Feb 08 04:11:03 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.