[SERVER-40599] Consider scanning multikey indices for sorting on arrays and limit:1 Created: 11/Apr/19  Updated: 09/Jul/19  Resolved: 09/Jul/19

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor - P4
Reporter: Chris Harris Assignee: David Storch
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-31898 Improve sort analysis to allow multik... Closed
Sprint: Query 2019-07-15
Participants:
Case:

 Description   

As a special case to SERVER-40597, a multikey index could be used if the operation includes limit:1.  This optimization may not extend meaningfully for other values applied to the limit.



 Comments   
Comment by David Storch [ 09/Jul/19 ]

I don't think there is anything special about the limit:1 case that permits a non-blocking sort plan using a multikey index. Consider the following example:

MongoDB Enterprise > db.c.drop()
true
MongoDB Enterprise > db.c.createIndex({a: 1})
{
	"createdCollectionAutomatically" : true,
	"numIndexesBefore" : 1,
	"numIndexesAfter" : 2,
	"ok" : 1
}
MongoDB Enterprise > db.c.insert({a: [-2, 2]})
WriteResult({ "nInserted" : 1 })
MongoDB Enterprise > db.c.insert({a: [-1, 1]})
WriteResult({ "nInserted" : 1 })
MongoDB Enterprise > db.c.find({a: {$gt: 0}}).sort({a: 1})
{ "_id" : ObjectId("5d24d1ed89e5a05b7cc46743"), "a" : [ -2, 2 ] }
{ "_id" : ObjectId("5d24d1fe89e5a05b7cc46744"), "a" : [ -1, 1 ] }
MongoDB Enterprise > db.c.find({a: {$gt: 0}}).sort({a: 1}).limit(1)
{ "_id" : ObjectId("5d24d1ed89e5a05b7cc46743"), "a" : [ -2, 2 ] }

You can see that both documents match the "a > 0" predicate, because both arrays contain a positive number. According to the rules for array sorting as defined in SERVER-19402, the sort key for document {a: [-2, 2]} is -2 and the sort key for document {a: [-1, 1]} is -1. Therefore, the -2 document comes first in an ascending sort. You can see from the explain output that this query is using an explicit SORT stage rather than getting a non-blocking sort from an index scan:

{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.c",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"a" : {
				"$gt" : 0
			}
		},
		"queryHash" : "4E7A4FB8",
		"planCacheKey" : "89D85B86",
		"winningPlan" : {
			"stage" : "SORT",
			"sortPattern" : {
				"a" : 1
			},
			"limitAmount" : 1,
			"inputStage" : {
				"stage" : "SORT_KEY_GENERATOR",
				"inputStage" : {
					"stage" : "FETCH",
					"inputStage" : {
						"stage" : "IXSCAN",
						"keyPattern" : {
							"a" : 1
						},
						"indexName" : "a_1",
						"isMultiKey" : true,
						"multiKeyPaths" : {
							"a" : [
								"a"
							]
						},
						"isUnique" : false,
						"isSparse" : false,
						"isPartial" : false,
						"indexVersion" : 2,
						"direction" : "forward",
						"indexBounds" : {
							"a" : [
								"(0.0, inf.0]"
							]
						}
					}
				}
			}
		},
		"rejectedPlans" : [
			{
				"stage" : "SORT",
				"sortPattern" : {
					"a" : 1
				},
				"limitAmount" : 1,
				"inputStage" : {
					"stage" : "SORT_KEY_GENERATOR",
					"inputStage" : {
						"stage" : "FETCH",
						"filter" : {
							"a" : {
								"$gt" : 0
							}
						},
						"inputStage" : {
							"stage" : "IXSCAN",
							"keyPattern" : {
								"a" : 1
							},
							"indexName" : "a_1",
							"isMultiKey" : true,
							"multiKeyPaths" : {
								"a" : [
									"a"
								]
							},
							"isUnique" : false,
							"isSparse" : false,
							"isPartial" : false,
							"indexVersion" : 2,
							"direction" : "forward",
							"indexBounds" : {
								"a" : [
									"[MinKey, MaxKey]"
								]
							}
						}
					}
				}
			}
		]
	},
	"serverInfo" : {
		"host" : "storchbox",
		"port" : 27017,
		"version" : "0.0.0",
		"gitVersion" : "unknown"
	},
	"ok" : 1
}

If we were to implement the suggested optimization, the plan would instead be an index scan on index {a: 1} using the bounds (0, Infinity]. The first key encountered would be "1", which would result in document {a: [-1, 1]} being incorrectly returned as the result of the query. As described in SERVER-31898, this plan would only be correct if the index bounds chosen were [MinKey, MaxKey]. While the planner could generate an unbounded IXSCAN plan, this could lead to poor performance due to scanning a large number of index keys.

Since the limit:1 case isn't special, I'm closing this ticket as a duplicate of SERVER-31898, which describes the cases in which it is correct for a multikey index to provide a non-blocking sort.

Comment by Asya Kamsky [ 15/Apr/19 ]

Oh never mind I see you specifically call our limit 1 and not all limits.

Comment by Asya Kamsky [ 15/Apr/19 ]

Is that correct? Multiple index entries can point to the same document so limit applies to number of documents and not number of index entries and aren’t those deduped later in the process than when limit could be helpful?

Generated at Thu Feb 08 04:55:28 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.