[SERVER-29300] Consider allowing COUNT_SCAN plans when the $ne operator is present Created: 19/May/17  Updated: 06/Dec/22  Resolved: 02/Jun/17

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Chris Harris Assignee: Backlog - Query Team (Inactive)
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-27822 Count plans sometimes don't need a FETCH Closed
Assigned Teams:
Query
Participants:

 Description   

Currently count operations that use the $ne operator appear to be ineligible to take advantage of the COUNT_SCAN stage. Such operations currently have the COUNT, FETCH, and IXSCAN stages although no filters are applied in the FETCH stage. The server should be able to generate a COUNT_SCAN plan if there are no benefits to the current implementation.

> db.version()
3.5.7
> 
> db.col.explain().count({x:{$ne:1}})
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.col",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"$nor" : [
				{
					"x" : {
						"$eq" : 1
					}
				}
			]
		},
		"winningPlan" : {
			"stage" : "COUNT",
			"inputStage" : {
				"stage" : "FETCH",
				"inputStage" : {
					"stage" : "IXSCAN",
					"keyPattern" : {
						"x" : 1
					},
					"indexName" : "x_1",
					"isMultiKey" : false,
					"multiKeyPaths" : {
						"x" : [ ]
					},
					"isUnique" : false,
					"isSparse" : false,
					"isPartial" : false,
					"indexVersion" : 2,
					"direction" : "forward",
					"indexBounds" : {
						"x" : [
							"[MinKey, 1.0)",
							"(1.0, MaxKey]"
						]
					}
				}
			}
		},
		"rejectedPlans" : [ ]
	},
	"serverInfo" : {
		"host" : "",
		"port" : 27037,
		"version" : "3.5.7",
		"gitVersion" : "79c2c2d340982da8b669cd7c3aa6b958dbf56263"
	},
	"ok" : 1
}
> 
> db.col.createIndex({x:1})
{
	"createdCollectionAutomatically" : true,
	"numIndexesBefore" : 1,
	"numIndexesAfter" : 2,
	"ok" : 1
}
> 
> db.col.explain().count({x:1})
{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "test.col",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"x" : {
				"$eq" : 1
			}
		},
		"winningPlan" : {
			"stage" : "COUNT",
			"inputStage" : {
				"stage" : "COUNT_SCAN",
				"keyPattern" : {
					"x" : 1
				},
				"indexName" : "x_1",
				"isMultiKey" : false,
				"multiKeyPaths" : {
					"x" : [ ]
				},
				"isUnique" : false,
				"isSparse" : false,
				"isPartial" : false,
				"indexVersion" : 2,
				"indexBounds" : {
					"startKey" : {
						"x" : 1
					},
					"startKeyInclusive" : true,
					"endKey" : {
						"x" : 1
					},
					"endKeyInclusive" : true
				}
			}
		},
		"rejectedPlans" : [ ]
	},
	"serverInfo" : {
		"host" : "",
		"port" : 27037,
		"version" : "3.5.7",
		"gitVersion" : "79c2c2d340982da8b669cd7c3aa6b958dbf56263"
	},
	"ok" : 1
}



 Comments   
Comment by Chris Harris [ 09/Jun/17 ]

Thanks for the detailed feedback kyle.suarez, justin.seyster, and david.storch . It is very interesting.

I am inclined to vote for only revisiting this as an improvement request after the implementation of SERVER-27822. If its performance at that time is indeed similar to COUNT_SCAN then this request will be effectively obsolete.

Comment by David Storch [ 02/Jun/17 ]

justin.seyster pointed out that in the case of multiple ranges, we could generate an OR plan whose children consisted of multiple COUNT_SCANs, each consisting of a single range. This "exploded" count scan might work similarly to the explodeForSort() planning path. In the case of explodeForSort(), we turn one IXSCAN into several IXSCANS in order to generate a SORT_MERGE plan, thus avoiding a blocking SORT stage.

While this would be an interesting improvement, it is not planned for the immediate future. charris, if you think this is important for our users, we could re-purpose this ticket to be about many-range COUNT_SCAN plans.

Comment by Kyle Suarez [ 02/Jun/17 ]

Hey christopher.harris,

This query is ineligible for the COUNT_SCAN optimization not because of something inherent about $ne but because the index bounds aren't a single range (here, we have the two intervals [MinKey, 1) and (1, MaxKey]).

After discussing in Morning Meeting, we're not sure if there's a benefit to allowing COUNT_SCAN on index scans with multiple intervals. One of the benefits of COUNT_SCAN is that it sets an end position on a storage engine index cursor, allowing it to scan all the way to the end. To allow for multiple intervals, we'd have to copy some of the re-seeking logic in the IXSCAN code, which sort of defeats the purpose of the fast count path. Hence, we're inclined to close the ticket as Won't Fix.

However, you'll be happy to know that anne.lim is working on SERVER-27822, which will eliminate an unnecessary FETCH on count plans if the FETCH doesn't have a filter. Thus, the query in this ticket would be optimized to COUNT - IXSCAN. I suspect that this covered IXSCAN will be quite close in performance to COUNT_SCAN, but some extra performance testing will need to be done to confirm.

Cheers,
Kyle

Generated at Thu Feb 08 04:20:27 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.