[SERVER-12923] Plan ranking is bad for plans with blocking stages Created: 26/Feb/14  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Querying
Affects Version/s: 2.4.8, 2.6.0-rc0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: David Storch Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 12
Labels: 26qa, bonsai, query-44-grooming
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
is duplicated by SERVER-13866 Wrong index is being picked Closed
is duplicated by SERVER-42733 Query not using index intersection Closed
Related
is related to SERVER-13831 Find with $gte/$gt/$lte/$lt on Date f... Closed
is related to SERVER-20619 Statistics-based query optimization Backlog
Assigned Teams:
Query Optimization
Operating System: ALL
Participants:
Case:

 Description   

Currently we rank plans by work()'ing them some number of times and seeing who produces the most results.

If there is a plan with a blocking stage, it will therefore almost always be ranked lower than plans without a blocking stage. This is because plans without a blocking stage immediately start producing results. Although the plan with the blocking stage doesn't appear to be making progress immediately, it may end up looking much better / hitting EOF much faster if we were to run the plans to completion.

Blocking stages for which this is an issue:

  • SORT
  • AND_HASH

Here's one way to reproduce the issue, for the blocking SORT stage:

> t = db.t
test.t
> t.drop()
true
> t.ensureIndex({a: 1})
WriteResult({ "nInserted" : 1 })
> t.ensureIndex({b: 1})
WriteResult({ "nInserted" : 1 })
> for (var i = 0; i < 10000; i++) { t.save({a: 1}); }
WriteResult({ "nInserted" : 1 })
> for (var i = 0; i < 150; i++) { t.save({b: 1}); }
WriteResult({ "nInserted" : 1 })
> t.find({b: 1}).sort({a: 1}).explain()
{
	"cursor" : "BtreeCursor a_1",
	"isMultiKey" : false,
	"n" : 150,
	"nscannedObjects" : 10150,
	"nscanned" : 10150,
	"nscannedObjectsAllPlans" : 10347,
	"nscannedAllPlans" : 10348,
	"scanAndOrder" : false,
	"indexOnly" : false,
	"nYields" : 85,
	"nChunkSkips" : 0,
	"millis" : 139,
	"indexBounds" : {
		"a" : [
			[
				{
					"$minElement" : 1
				},
				{
					"$maxElement" : 1
				}
			]
		]
	},
	"server" : "localhost:27017",
	"filterSet" : false
}

The query ends up using index {a: 1}, which involves scanning 10,000 docs and fetching them from disk if they're not in memory. We would have liked to use index {b: 1}, which would involve fetching only 150 docs and then sorting them in memory.



 Comments   
Comment by David Storch [ 11/May/15 ]

Hi bigbourin@gmail.com,

It's hard to say without seeing specific details whether or not this ticket is the root cause of your issue. I suggest opening a new SERVER ticket with the following details:

  • The version of your server. If you are on 2.6 before 2.6.9, you should consider upgrading to the latest patch release, as it contains fixes which may resolve some plan selection issues.
  • Any relevant slow log lines.
  • The output of getIndexes on the affected collection.
  • The output of running .explain(true) on the problematic query.

This additional information will aid in the debugging process.

Best,
Dave

Comment by Adrien Jarthon [ 11/May/15 ]

We also experienced this issue on a big collection (3 million records) where a query iterated over the 3 million records instead of 1 or 2 by using the proper index.
It used the sort index instead of the very discriminating filter index, like in your example. Fortunately this happens rarely in our case, but when it does the requests take 3 minutes, which is bad
We understand this is quite hard to improve and can already be fixed by hinting the query but this is not a viable solution for us as we have hundreds of different queries that hit this collection.
If you can do anything that's great, anyway thanks for you work

Generated at Thu Feb 08 03:30:04 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.