[SERVER-23411] Inconsistent results when setting a nested field in $project Created: 29/Mar/16  Updated: 06/Dec/22  Resolved: 24/Jun/16

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Charlie Swanson Assignee: Backlog - Query Team (Inactive)
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-18966 Allow exclusion in $project stage of ... Closed
Related
is related to SERVER-18966 Allow exclusion in $project stage of ... Closed
Assigned Teams:
Query
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

If you use a non-inclusion dotted field during a $project stage, you can get different results depending on if the incoming document contains anything in that path:

> db.foo.drop()
true
> db.foo.insert({a: [{b: 1}, {b: 2}]})
WriteResult({ "nInserted" : 1 })
> db.foo.aggregate([{$project: {"x.y": {$literal: 1}}}])
// "x.y" didn't exist, so we created a nested field.
{ "_id" : ObjectId("56fae6c03ef38a5ce210b232"), "x" : { "y" : 1 } }
> db.foo.aggregate([{$project: {"a.b": {$literal: 1}, c: "$a.b"}}])
// "a.b" did exist, so we modified each value of "a.b" to be 1.
{ "_id" : ObjectId("56fae6c03ef38a5ce210b232"), "a" : [ { "b" : 1 }, { "b" : 1 } ], "c" : [ 1, 2 ] }

What's even weirder is this behavior:

> db.foo.find()
{ "_id" : ObjectId("56fae6c03ef38a5ce210b232"), "a" : [ { "b" : 1 }, { "b" : 2 } ] }
> db.foo.aggregate([{$project: {"a.b": {$literal: 1}}}])
// We created a nested field "a.b", even though there was previously a different structure in "a.b".
{ "_id" : ObjectId("56fae6c03ef38a5ce210b232"), "a" : { "b" : 1 } }

Here the document did have something in "a.b", but nothing else was using it, so it was projected out, hence the different behavior from the example above, which specified {c: "$a.b"} so that the field "a.b" was needed, and hence remained in the incoming document.

> db.foo.explain().aggregate([{$project: {"a.b": {$literal: 1}}}])
{
	"waitedMS" : NumberLong(0),
	"stages" : [
		{
			"$cursor" : {
				"query" : {
					
				},
				"fields" : {
					"_id" : 1
				},
...

> db.foo.explain().aggregate([{$project: {"a.b": {$literal: 1}, c: "$a.b"}}])
{
	"waitedMS" : NumberLong(0),
	"stages" : [
		{
			"$cursor" : {
				"query" : {
					
				},
				"fields" : {
					"a.b" : 1,
					"_id" : 1
				},
...



 Comments   
Comment by Charlie Swanson [ 24/Jun/16 ]

We went with the first approach listed above.

This was resolved with 752eb0f448987b3e4183beacc622f4e95e7803be

Comment by Charlie Swanson [ 29/Mar/16 ]

I can think of a couple ways to fix this:

  • Whenever you set a dotted field in a $project stage, add it as a dependency.
    • This would always 'traverse' the input document, and replace any occurrences with the new value.
    • You could instead use the syntax {$project: {a: {b: <value>}}} if you actually wanted to create a new nested document, replacing the existing structure.
    • This could potentially cause an aggregation to lose the benefit of an index scan. If you had an index on {a: 1}, then the pipeline {$project: {_id: 0, a: 1}} could use a covered index scan, since it only needs the field "a", but the pipeline {$project: {_id: 0, a: 1, "b.c": "$a"}} could not, since it would also need to look at "b.c" to see if it needs to add a field or traverse and replace things in "b.c".
  • Whenever you set a dotted field "x.y....z" in a $project stage, always set it to be {x: {y: { ... { z: <value> }}}.
    • This would avoid adding extra dependencies in the case where you actually wanted to set a new field.
    • We could add some sort of expression or stage that would allow you to 'traverse' the input shape and replace values as needed.
Generated at Thu Feb 08 04:03:18 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.