[SERVER-56084] [SBE] aggregation_multiversion_fuzzer fails with SBE on due to different results from $graphLookup Created: 13/Apr/21  Updated: 15/Apr/21  Resolved: 15/Apr/21

Status: Closed
Project: Core Server
Component/s: Query Execution
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: David Storch Assignee: David Storch
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File repro.js    
Issue Links:
Duplicate
duplicates SERVER-56102 Fix SBE bit-test operators to match c... Closed
duplicates SERVER-53900 Ensure expression arguments checks se... Closed
Related
is related to SERVER-56083 [SBE] Expressions unsupported in SBE ... Closed
Operating System: ALL
Steps To Reproduce:

python3 buildscripts/resmoke.py run --suites=jstestfuzz --additionalFeatureFlags=featureFlagSBE repro.js

Sprint: Query Execution 2021-04-19
Participants:

 Description   

I've attached a repro script, which is just the generated test trimmed down to make it run much faster. The generated test came from this failing task. The offending query is the following:

[js_test:agg_fuzzer-8818-1618157020929-0] 	"command" : {
[js_test:agg_fuzzer-8818-1618157020929-0] 		"aggregate" : "fuzzer_coll",
[js_test:agg_fuzzer-8818-1618157020929-0] 		"pipeline" : [
[js_test:agg_fuzzer-8818-1618157020929-0] 			{
[js_test:agg_fuzzer-8818-1618157020929-0] 				"$sort" : {
[js_test:agg_fuzzer-8818-1618157020929-0] 					"_id" : 1
[js_test:agg_fuzzer-8818-1618157020929-0] 				}
[js_test:agg_fuzzer-8818-1618157020929-0] 			},
[js_test:agg_fuzzer-8818-1618157020929-0] 			{
[js_test:agg_fuzzer-8818-1618157020929-0] 				"$limit" : 20
[js_test:agg_fuzzer-8818-1618157020929-0] 			},
[js_test:agg_fuzzer-8818-1618157020929-0] 			{
[js_test:agg_fuzzer-8818-1618157020929-0] 				"$graphLookup" : {
[js_test:agg_fuzzer-8818-1618157020929-0] 					"from" : "fuzzer_coll",
[js_test:agg_fuzzer-8818-1618157020929-0] 					"startWith" : {
[js_test:agg_fuzzer-8818-1618157020929-0] 						"$setUnion" : [
[js_test:agg_fuzzer-8818-1618157020929-0] 							[
[js_test:agg_fuzzer-8818-1618157020929-0] 								/Dobra/
[js_test:agg_fuzzer-8818-1618157020929-0] 							],
[js_test:agg_fuzzer-8818-1618157020929-0] 							"$array",
[js_test:agg_fuzzer-8818-1618157020929-0] 							"$obj.obj.obj.obj.array"
[js_test:agg_fuzzer-8818-1618157020929-0] 						]
[js_test:agg_fuzzer-8818-1618157020929-0] 					},
[js_test:agg_fuzzer-8818-1618157020929-0] 					"connectFromField" : "obj.obj.obj.obj.obj.obj",
[js_test:agg_fuzzer-8818-1618157020929-0] 					"connectToField" : "obj.obj.obj.obj.date",
[js_test:agg_fuzzer-8818-1618157020929-0] 					"as" : "array",
[js_test:agg_fuzzer-8818-1618157020929-0] 					"depthField" : "obj.obj.num",
[js_test:agg_fuzzer-8818-1618157020929-0] 					"restrictSearchWithMatch" : {
[js_test:agg_fuzzer-8818-1618157020929-0] 						"$or" : [
[js_test:agg_fuzzer-8818-1618157020929-0] 							{
[js_test:agg_fuzzer-8818-1618157020929-0] 								"obj.obj.obj.num" : {
[js_test:agg_fuzzer-8818-1618157020929-0] 									"$not" : {
[js_test:agg_fuzzer-8818-1618157020929-0] 										"$bitsAnyClear" : 1001
[js_test:agg_fuzzer-8818-1618157020929-0] 									}
[js_test:agg_fuzzer-8818-1618157020929-0] 								}
[js_test:agg_fuzzer-8818-1618157020929-0] 							}
[js_test:agg_fuzzer-8818-1618157020929-0] 						]
[js_test:agg_fuzzer-8818-1618157020929-0] 					}
[js_test:agg_fuzzer-8818-1618157020929-0] 				}
[js_test:agg_fuzzer-8818-1618157020929-0] 			}
[js_test:agg_fuzzer-8818-1618157020929-0] 		],
[js_test:agg_fuzzer-8818-1618157020929-0] 		"maxTimeMS" : 30000,
[js_test:agg_fuzzer-8818-1618157020929-0] 		"cursor" : {
[js_test:agg_fuzzer-8818-1618157020929-0] 
[js_test:agg_fuzzer-8818-1618157020929-0] 		},
[js_test:agg_fuzzer-8818-1618157020929-0] 		"$db" : "fuzzer"

Interestingly, due to SERVER-56083, the query does not run using SBE in either the control (4.4) or experimental (5.0) versions. I suspect, however, that the queries executed recursively by $graphLookup end up using SBE, which is where things have gone wrong.

Looking at the output from the fuzzer, I can see that both servers returned 20 documents with the same _id values. However, none of the 20 documents returned by 4.4 were equal to any of the documents returned by 5.0. Drilling down, I looked specifically at the documents returned by each version with _id:0. I can see that the only difference between the two is the "array" field, which is what is produced by the $graphLookup stage.

For the document with _id:0, I dumped the contents of "array" from the lefthand-side server into a collection called "l" and the contents of "array" from the righthand-side server into a collection called "r", and this shows that the lengths of the arrays were different:

MongoDB Enterprise > db.l.count()
190
MongoDB Enterprise > db.r.count()
188

I believe this means that a few documents were missed by the $graphLookup when SBE was enabled. The following query shows exactly which were missed, specifically for the $graphLookup associated with _id:0:

MongoDB Enterprise > db.l.aggregate([{$lookup: {from: "r", as: "match", localField: "_id", foreignField: "_id"}}, {$match: {match: {$size: 0}}}]).pretty()
{
	"_id" : 1078,
	"str" : "orchid",
	"num" : NumberLong(2178),
	"date" : ISODate("2019-04-22T20:50:00.826Z"),
	"array" : [
		NumberLong(32525),
		NumberLong(44047),
		ISODate("2019-06-13T02:41:44.348Z"),
		"Tools",
		"Grove Product",
		28213,
		ISODate("2019-03-13T20:33:11.273Z"),
		ISODate("2019-09-15T05:04:20.672Z"),
		ISODate("2019-11-11T06:16:33.307Z"),
		{
			"_id" : 1079,
			"str" : "Georgia Iraqi Dinar Lead",
			"num" : 69643,
			"date" : NaN,
			"obj" : {
				"_id" : 1080,
				"str" : "generate THX virtual",
				"num" : NumberLong(94448),
				"array" : [
					NumberDecimal("-643.826448748712"),
					[ ],
					NumberDecimal("272.28819709550726"),
					null,
					"e-tailers Mission"
				],
				"obj" : {
					"_id" : 1081,
					"str" : "envisioneer evolve Forks",
					"num" : NumberDecimal("-397.33322040453436"),
					"date" : ISODate("2019-07-05T14:28:07.010Z"),
					"obj" : {
 
					}
				}
			}
		}
	],
	"obj" : {
		"_id" : 1082,
		"str" : "Intranet",
		"num" : 74811,
		"array" : [
			{
				"_id" : 1083,
				"date" : ISODate("2019-05-15T16:46:30.286Z"),
				"array" : [
					{
						"_id" : 1084,
						"str" : null,
						"num" : 20959,
						"date" : null,
						"array" : [ ],
						"obj" : {
 
						}
					},
					null,
					NaN,
					ISODate("2019-05-13T14:55:24.201Z")
				]
			}
		],
		"obj" : {
			"_id" : 1085,
			"array" : [
				66718,
				"back-end intangible",
				[
					"Multi-tiered turquoise Credit Card Account",
					NumberDecimal("-4.27904291615323"),
					{
						"_id" : 1086,
						"date" : ISODate("2019-10-15T15:08:13.433Z"),
						"array" : [ ]
					}
				]
			],
			"obj" : {
				"_id" : 1087,
				"num" : NumberDecimal("-740.555235767481"),
				"date" : ISODate("2019-11-23T19:12:50.139Z")
			},
			"num" : NumberLong(0)
		}
	},
	"match" : [ ]
}
{
	"_id" : 469,
	"str" : "Tasty Rubber Pizza Bond Markets Units European Composite Unit (EURCO) robust",
	"num" : NumberLong(81616),
	"date" : ISODate("2019-08-05T01:32:15.829Z"),
	"array" : [
		ISODate("2019-01-28T14:08:07.624Z"),
		"New Jersey Consultant",
		23767,
		72540,
		ISODate("2019-12-28T19:30:44.661Z"),
		NumberLong(32526),
		"SQL",
		[ ],
		[ ],
		75037
	],
	"obj" : {
		"_id" : 470,
		"str" : "Checking Account",
		"num" : NumberLong(79901),
		"date" : ISODate("2019-02-14T12:40:50.831Z"),
		"array" : [
			"CSS navigate Singapore",
			null,
			{
				"_id" : 471,
				"str" : "quantify",
				"num" : NumberDecimal("-793.7706100048798"),
				"date" : ISODate("2019-02-20T16:39:48.120Z"),
				"array" : [
					NaN,
					NaN
				],
				"obj" : {
					"_id" : 472,
					"obj" : {
						"_id" : 473,
						"num" : 29622,
						"date" : ISODate("2019-09-20T13:10:23.122Z"),
						"array" : [
							null,
							ISODate("2019-10-15T02:26:25.924Z"),
							[ ]
						],
						"obj" : {
 
						}
					}
				}
			}
		],
		"obj" : {
			"_id" : 474,
			"date" : NaN,
			"array" : [ ],
			"obj" : {
				"_id" : 475,
				"str" : "Mobility",
				"num" : NumberDecimal("217.89759387808635")
			},
			"num" : NumberLong(0)
		}
	},
	"match" : [ ]
}



 Comments   
Comment by David Storch [ 15/Apr/21 ]

After some more investigation, it looks like the known failures involving $graphLookup are manifestations of other known issues. Most of them are SERVER-56102 as mentioned above. The failure involving $concat is part of SERVER-53900. I am closing this ticket as a duplicate.

If we see future failures of $graphLookup, I would expect them to be manifestations of other underlying problems, not necessarily caused by $graphLookup itself. I suspect we are seeing $graphLookup fail because this is one of the few ways that we hit SBE at the moment in the aggregation_multiversion_fuzzer due to SERVER-56083.

Comment by David Storch [ 14/Apr/21 ]

I just noticed that the $graphLookup involves $bitsAnyClear inside the restrictWithMatch match expression. Therefore, it seems likely that the differing results are due to SERVER-56102, which I have linked as related! We should have the same assignee work on this ticket and SERVER-56102, as they now appear to be closely related (probably duplicates).

We should still use this ticket to investigate the second test failure that I mentioned above.

Comment by David Storch [ 14/Apr/21 ]

I'm realizing that I'm not sure how "l" and "r" map to 4.4 and 5.0 in my description above. So when this gets investigated, we should keep in mind that either SBE is returning extraneous results or has missing results. I'm not sure which.

Comment by David Storch [ 14/Apr/21 ]

As part of this ticket, we should also investigate this task failure. In this case, there is a $graphLookup with the "restrictWithMatch" option. An expression inside "restrictWithMatch" throws an exception when running against 5.0 (SBE on) but the query succeeds when running against 4.4 (SBE off).

Generated at Thu Feb 08 05:38:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.