[SERVER-40597] Avoid performing full scans of multikey indices when a SORT stage is required Created: 11/Apr/19  Updated: 27/Oct/23  Resolved: 17/Jan/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Chris Harris Assignee: Timour Katchaounov
Resolution: Gone away Votes: 0
Labels: query-44-grooming
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Sprint: QO 2022-02-07
Participants:

 Description   

Due to the changes in SERVER-19402, it is no longer correct for a multikey index scan to provide a non-blocking sort. Therefore, in general, a multikey index should not be used for operations if it is only eligible for consideration based on the requested sort.

This appears to already be the case for aggregations as a COLLSCAN is performed:

> db.demo.explain().aggregate([{$sort:{x:1}}])
{
	"stages" : [
		{
			"$cursor" : {
				"query" : {
					
				},
				"queryPlanner" : {
					"plannerVersion" : 1,
					"namespace" : "test.demo",
					"indexFilterSet" : false,
					"parsedQuery" : {
						
					},
					"queryHash" : "8B3D4AB8",
					"winningPlan" : {
						"stage" : "COLLSCAN",
						"direction" : "forward"
					},
					"rejectedPlans" : [ ]
				}
			}
		},
		{
			"$sort" : {
				"sortKey" : {
					"x" : 1
				}
			}
		}
	],
	"ok" : 1
}

However find commands appear to generate a plan with "[MinKey, MaxKey]"  index bounds:

> db.demo.explain().find().sort({x:1})
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.demo",
		"indexFilterSet" : false,
		"parsedQuery" : {
			
		},
		"queryHash" : "B7791BAD",
		"winningPlan" : {
			"stage" : "SORT",
			"sortPattern" : {
				"x" : 1
			},
			"inputStage" : {
				"stage" : "SORT_KEY_GENERATOR",
				"inputStage" : {
					"stage" : "FETCH",
					"inputStage" : {
						"stage" : "IXSCAN",
						"keyPattern" : {
							"x" : 1
						},
						"indexName" : "x_1",
						"isMultiKey" : true,
						"multiKeyPaths" : {
							"x" : [
								"x"
							]
						},
						"isUnique" : false,
						"isSparse" : false,
						"isPartial" : false,
						"indexVersion" : 2,
						"direction" : "forward",
						"indexBounds" : {
							"x" : [
								"[MinKey, MaxKey]"
							]
						}
					}
				}
			}
		},
		"rejectedPlans" : [ ]
	},
	"serverInfo" : {
		"host" : "Chriss-MacBook-Pro.local",
		"port" : 27017,
		"version" : "4.1.4",
		"gitVersion" : "2f4b5918497b09a226a3ec5dcff930edd52ea1e9"
	},
	"ok" : 1
}



 Comments   
Comment by Timour Katchaounov [ 17/Jan/22 ]

This issue has been addressed by SERVER-31898.

Comment by Timour Katchaounov [ 17/Jan/22 ]

With SERVER-31898 two kinds of plans are generated:

  1. If there is no condition that can be used as an index bound, then we generate a full index scan that satisfies the sort (and there is no sort). If the index scan is non-covering, there is a fetch. This plan may be cheaper than doing a collection scan + sort, depends on the difference of the cost of sorting the whole collection vs fetching the whole collection. Currently we seem to think that sorting is more expensive than fetching so we choose a plan that avoids sorting. This is not necessarily true.
  2. If there is a condition that can be used as an index bound there are two choices:
    1. Push the condition as an index boundary, then fetch, then sort,
      That is: IXSCAN[LowKey, HighKey] -> Fetch -> Sort, or
    2. Full index scan, then fetch and filter (and no sort)
      That is: IXSCAN[MinKey, MaxKey] -> Fetch -> Filter (condition)

I believe that SERVER-31898 addresses the problem, and this bug should be closed as david.storch suggests. The logic called from the block "if (!usingIndexToSort) ..." eventually ends up testing each index scan via confirmBoundsProvideSortComponentGivenMultikeyness implemented by SERVER-31898, which allows only full index scans against multikey indexes that satsify the sort.

Comment by David Storch [ 08/Jan/21 ]

david.percy I don't remember all the details from SERVER-31898, but based on a quick review of this ticket I think that closing it probably makes sense. I would argue we should close it as "Gone Away" and related to SERVER-31898 rather than as a dupe, but that's a detail.

A question, though. Were the problematic plans being generated by this logic? And is our reasoning that since this logic always creates plans which scan the entire index from beginning to end, that such plans will now always be deemed capable of providing the requested sort regardless of multikeyness? Assuming I'm getting this all right, closing as discussed above sounds good.

Generated at Thu Feb 08 04:55:27 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.