[SERVER-29227] $nin operation choose IXSCAN but not COLLSCAN stag Created: 16/May/17  Updated: 06/Dec/22  Resolved: 19/May/17

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: stone [X] Assignee: Backlog - Query Team (Inactive)
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-13065 Consider a collection scan even if in... Backlog
Related
related to SERVER-23406 index scan is slower than full collec... Backlog
Assigned Teams:
Query
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

$nin always need to return most of the data.so collscan always more efficiency than index scan.as index scan need to query twice.
but have found mongo does not choose COLLSCAN for $nin operation.

have created a collection conn in mongo 3.4 env.

db.conn.insert({
sourceId:
targetId:
})
 
db.conn.createIndex({targetId:1})

but when I execute

db.conn.find({targetId: {$nin:['xxx']}}).explain("executionStats")

find that mongo choose IXSCAN instead of COLLSCAN stage.

when the document record is 100K,1M.
it always choose COLLSCAN.

and the result is that IXSCAN spend more time than COLLSCAN

COLLSCAN:

"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 999004,
		"executionTimeMillis" : 406,
		"totalKeysExamined" : 0,
		"totalDocsExamined" : 1000000,
		"executionStages" : {
			"stage" : "COLLSCAN",
			"filter" : {
				"$not" : {
					"targetId" : {
						"$in" : [
							"abc1234"
						]
					}
				}
			},

INSCAN:

"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 999004,
		"executionTimeMillis" : 1863,
		"totalKeysExamined" : 999005,
		"totalDocsExamined" : 999004,
		"executionStages" : {

can anyone help explain this?



 Comments   
Comment by Asya Kamsky [ 19/May/17 ]

There in as assumption stated that's not necessarily correct:

$nin always need to return most of the data

Depending on the query and data distribution, $nin could be very selective, just like it's possible that "$in" will match/return majority of the data.

Regardless, there are two separate valid issues raised in this ticket:

One is that in some scenarios, IXSCAN can be slower than COLLSCAN - we are aware of this and looking to make improvements, tracked as SERVER-23406.

The other is that our query system does not consider a collection scan when there is an index available to satisfy a query. I'm going to close this ticket as duplicate of SERVER-13065 which tracks adding such ability to our query optimizer - in other words to consider a collection scan along-side with considering different indexed plans.

Comment by stone [X] [ 18/May/17 ]

thanks for your explanation,Kyle Suarez

Comment by Kyle Suarez [ 17/May/17 ]

Hi cnStoneFang,

Sorry for not adding a comment. The ticket is assigned to the Query Team but has a fixVersion of "Needs Triage", meaning we'll be considering this ticket during our next round of planning. We'll post an update then – please continue to watch this ticket for updates.

Regards,
Kyle

Comment by stone [X] [ 17/May/17 ]

why remove to backlog without any comments?

Generated at Thu Feb 08 04:20:15 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.