[SERVER-50779] ProjectionExecutor incorrectly leaves tombstones when evaluating arrays Created: 07/Sep/20  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Arun Banala Assignee: Backlog - Query Execution
Resolution: Unresolved Votes: 0
Labels: qexec-team
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-70860 $addFields recreates an array element... Closed
Related
Assigned Teams:
Query Execution
Operating System: ALL
Sprint: Query 2020-11-30, Query 2020-12-14, Query 2020-12-28, Query 2021-01-11, Query 2021-01-25, Query Execution 2021-03-08, Query Execution 2021-03-22
Participants:
Linked BF Score: 0

 Description   

The ProjectionExecutor code will leave behind a MISSING placeholder for each array element, when the projection path prefix is an array. The FastPathEligibleInclusionNode doesn't seem to produce the same issue, since it uses BSONObj. BSONObj doesn't have the tombstones issue.

This can be fixed by returning an empty array if all the array elements have MISSING value here.

> db.c.insert({arr: [0]})
// This uses FastPathEligibleInclusionNode
> db.c.aggregate([{$_internalInhibitOptimization: {}},{$project: {"arr.a" : 1}}, {$addFields: { "arr.val" : ""}}]) 
{ "_id" : ObjectId("5f5646abfb04e16296128f3f"), "arr" : [ ] }
// This uses InclusionNode main path
> db.c.aggregate([{$_internalInhibitOptimization: {}},{$project: {"arr.a" : 1, p: {$literal: 1}}}, {$addFields: { "arr.val" : ""}}]) 
{ "_id" : ObjectId("5f5646abfb04e16296128f3f"), "arr" : [ { "val" : "" } ], "p" : 1 }



 Comments   
Comment by David Storch [ 03/Nov/22 ]

In addition to the manifestation of this problem originally described by arun.banala@mongodb.com in the ticket description, I've crafted another scenario where tombstones left behind inside arrays affect the correctness of downstream operations in the query execution plan:

(function() {
"use strict";
 
const coll = db.tombstone_repro;
coll.drop();
 
assert.commandWorked(coll.insert({a: 1, b: [{x: 1, y: 1}, 1], c: [{x: 1}]}));
 
const pipeline[{$group: {_id: "$a", b: {$first: "$b"}, c: {$first: "$c"}}},
               {$project: {"b.x": 1, "c.x": 1}},
               {$addFields: {comparison: {$eq: ["$b", "$c"]}}}];
printjson(coll.aggregate(pipeline).toArray());
}());

This script produces the following output:

[
	{
		"_id" : 1,
		"b" : [
			{
				"x" : 1
			}
		],
		"c" : [
			{
				"x" : 1
			}
		],
		"comparison" : false
	}
]

The fact that the comparison field is false seems wrong to me, since "b" and "c" are equal apart from tombstones. The concept of tombstones should be entirely internal to the implementation of the engine, but instead it has real consequences for the behavior of the query that are observable by the end user.

This ticket seems very similar to SERVER-70860, which was recently filed by ivan.fefer@mongodb.com. Ivan and Arun, do you think that SERVER-70860 should be closed as a duplicate of this ticket?

Generated at Thu Feb 08 05:23:33 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.