[SERVER-31838] Query planner generates DISTINCT_SCAN without FETCH plan for distinct on dotted field Created: 06/Nov/17  Updated: 06/Dec/22  Resolved: 29/Dec/17

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 3.6.0-rc1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Asya Kamsky Assignee: Backlog - Query Team (Inactive)
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-14832 Distinct command result set may inclu... Needs Scheduling
Assigned Teams:
Query
Operating System: ALL
Participants:

 Description   

When dotted field plan is generated for distinct command, it uses DISTINCT_SCAN without FETCH.

db.orders.find()
{ "_id" : ObjectId("5a006024e946f6d003f4cf93"), "lineitems" : { "returnflag" : "A" } }
{ "_id" : ObjectId("5a006028e946f6d003f4cf94"), "lineitems" : { "returnflag" : "B" } }
{ "_id" : ObjectId("5a006383e946f6d003f4cf96"), "lineitems" : { "otherflag" : 1 } }
/* without indexes */
db.orders.distinct("lineitems.returnflag")
 [ "A", "B" ]
db.orders.createIndex({"lineitems.returnflag":1, "lineitems.linestatus" : 1})
/* ok */
db.orders.distinct("lineitems.returnflag")
[ null, "A", "B" ]
/* make index multikey */
db.orders.insert({lineitems:[ {returnflag:"A"}, {returnflag:"B"} ]})
WriteResult({ "nInserted" : 1 })
db.orders.distinct("lineitems.returnflag")
[ "A", "B" ]

Note in case of using non-multikey index we show null (for missing).

On nonMultikey index explain was:

		"winningPlan" : {
			"stage" : "PROJECTION",
			"transformBy" : {
				"_id" : 0,
				"lineitems.returnflag" : 1
			},
			"inputStage" : {
				"stage" : "DISTINCT_SCAN",
				"keyPattern" : {
					"lineitems.returnflag" : 1,
					"lineitems.linestatus" : 1
				},
				"indexName" : "lineitems.returnflag_1_lineitems.linestatus_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"lineitems.returnflag" : [ ],
					"lineitems.linestatus" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"direction" : "forward",
				"indexBounds" : {
					"lineitems.returnflag" : [
						"[MinKey, MaxKey]"
					],
					"lineitems.linestatus" : [
						"[MinKey, MaxKey]"
					]
				}
			}
		},

hence picking up the null index value (which represented missing).

After the insert, it's:

		"winningPlan" : {
			"stage" : "COLLSCAN",
			"direction" : "forward"
		},

This change seems due to SERVER-2104 or SERVER-3173?
It's a correctness regression. Here's plan from 3.4.6 which correctly does not return NULL:

		"winningPlan" : {
			"stage" : "PROJECTION",
			"transformBy" : {
				"_id" : 0,
				"lineitems.status" : 1
			},
			"inputStage" : {
				"stage" : "FETCH",
				"inputStage" : {
					"stage" : "DISTINCT_SCAN",
					"keyPattern" : {
						"lineitems.status" : 1,
						"lineitems.date" : 1
					},
					"indexName" : "lineitems.status_1_lineitems.date_1",
					"isMultiKey" : false,
					"multiKeyPaths" : {
						"lineitems.status" : [ ],
						"lineitems.date" : [ ]
					},
					"isUnique" : false,
					"isSparse" : false,
					"isPartial" : false,
					"indexVersion" : 2,
					"direction" : "forward",
					"indexBounds" : {
						"lineitems.status" : [
							"[MinKey, MaxKey]"
						],
						"lineitems.date" : [
							"[MinKey, MaxKey]"
						]
					}
				}
			}
		},

3.4 correctly returns no "null" with or without the index.



 Comments   
Comment by Asya Kamsky [ 06/Nov/17 ]

Interesting, I'd forgotten about SERVER-14832 - I agree this is similar and no worse than that, though empty arrays are probably less common than missing values for dotted fields.

This ticket can probably be either closed as dupe of that one, or kept open if solving null vs missing is different than undefined vs missing.

Comment by David Storch [ 06/Nov/17 ]

asya, I believe this to be a duplicate of SERVER-14832. Certainly the issue can now occur where it could not before due to SERVER-2104, but it's hard to view this as a new issue given SERVER-14832. I'm also not sure that we should block the changes for SERVER-2104 on the resolution of SERVER-14832.

Generated at Thu Feb 08 04:28:20 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.