[SERVER-20616] Plan ranker sampling from the beginning of a query's execution can result in poor plan selection Created: 24/Sep/15  Updated: 30/Nov/23

Status: Backlog
Project: Core Server
Component/s: Querying
Affects Version/s: 2.6.11, 3.0.6, 3.2.0
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: David Storch Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 11
Labels: bonsai
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
is duplicated by SERVER-21178 Super slow query and increased memory... Closed
is duplicated by SERVER-82548 MongoDB 7.0.2 SBE selects a different... Closed
Related
related to SERVER-13211 Optimal index not chosen for query pl... Closed
related to SERVER-46904 Efficiency of one $or branch during p... Backlog
is related to SERVER-20619 Statistics-based query optimization Backlog
Assigned Teams:
Query Optimization
Backwards Compatibility: Fully Compatible
Participants:
Case:

 Description   

The query engine currently selects a winning plan among n candidate plans by the following mechanism. We start executing each of the n plans for some short trial period. Once the trial period ends, we use stats collected from the brief trial execution to score the plans. The most highly scored plan becomes the winner. The scoring is based on a notion of "productivity", essentially the ratio of results produced per amount of work (where work is an artificial unit that is a proxy for CPU + IO time).

You can think of the trial execution period as a way to sample the data on the fly, a kind of adaptive query processing technique in which inferences about the data are made while the query is executing. The problem is that this sampling strategy may not accurately represent the true data distribution.

For example, suppose that some candidate plan p1 is scanning the index {a: 1} over the interval [0, 100000] in order to answer the query predicate {a: {$gte: 0, $lte: 100000}, b: "selectiveString"}. There is an alternative plan p2 which simply looks up all docs where "b" is equal to some rare string, "selectiveString", in the {b: 1} index. Clearly p2 is the better plan: instead of scanning some large index range and filtering out all the documents without the correct value for "b", we do a lookup based on the more selective predicate.

However, depending on the data, the plan ranking algorithm described above might lead us to choose suboptimal plan p1 instead of p2. Suppose that there is a correlation between the "a" and "b" fields so that all of the documents where "b" is "selectiveString" also have an "a" value of 0. Since p1 is scanning index {a: 1} in order, we encounter all the documents where "a" is 0 and "b" is "selectiveString" first during the trial execution period. This can spuriously make p1 look as good as p2! If instead we were to sample randomly from the data where "a" is on the interval [0, 10000], the query engine would have quickly observed that plan p1 is much slower than p2. Real data is often correlated, so despite the fairly contrived example, this can be a problem is practice.



 Comments   
Comment by Xiaochen Wu [ 18/Aug/22 ]

can be reproduced by:

 

coll = client.test.indextie

coll.delete_many({})

indexes = coll.index_information()
for key in indexes.keys():
    if key != "id":
        coll.drop_index(key)

coll.create_index([("a", pymongo.ASCENDING)])
coll.create_index([("b", pymongo.ASCENDING)])

data = []

for i in range(1000):
    data.append({"a": 1, "b": 1})

for i in range(1000):
    data.append({"a": 1, "b": i})

coll.insert_many(data)
print(coll.find({"a": 1, "b": 1}).explain())

 

it should use index b but it a was used because it is created first.

Comment by Hoyt Ren [ 18/Mar/16 ]

I tried v3.2.4, still see the problem.

Comment by Hoyt Ren [ 03/Jan/16 ]

I tried v3.2, the problem is still there.

Comment by Hoyt Ren [ 07/Dec/15 ]

Yes, it's a good direction to try. I believe the difficult is that don't affect the query performance much. Maybe, we can use (or put) some statics in (or into) index?

By the way, I found the issue about manually control the selection of index, but it seems a long time that nobody update it.

SERVER-7944

Comment by David Storch [ 04/Dec/15 ]

Here's one idea of how to fix this problem. We could extend the MultiPlanStage / CachedPlanStage trial period indefinitely. Instead of buffering the first batch of results and then throwing them out if we have to re-plan the query, we could keep a running estimate of index selectivity. If at any point during query execution the selectivity drops off, we could trigger a replan. This would probably involve buffering RecordIds for every document in the reset set so that we can dedup after a replan.

There would be a few details to work out. For example, how many RecordIds would we be willing to store for each query? How would we deal with sorts? We would probably have to remember the sort key of the last document returned and throw out any results from the new plan on the wrong side of the sort.

Comment by Hoyt Ren [ 18/Nov/15 ]

Hello friends,

Since making an AI isn't easy, I see some people here mentioned, that allow manually selecting the index as traditional DB, I can't remember the issue ID. Will this be a quicker fix? I believe here isn't problem because all us use SQL in so long a time if we ask user input some additional parameters. However if mongo achieve an AI, it will be a great achievement as many others we can see from today's mongo. Really thank you for your exciting works.

Generated at Thu Feb 08 03:54:45 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.