[SERVER-10472] find(a).sort(b,_id) doesn't use index(a,b) to minimize scans Created: 09/Aug/13 Updated: 18/Dec/23 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Index Maintenance, Querying |
| Affects Version/s: | 2.4.4 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Tim Haines | Assignee: | Ben Shteinfeld |
| Resolution: | Unresolved | Votes: | 3 |
| Labels: | query-44-grooming, query-offsite, query_triage | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
OSX, MongoLabs |
||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Assigned Teams: |
Query Optimization
|
||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
I have a collection of documents with keys _id, a, and b. I have an index {a:1,b:1}. When I do find(a).sort(b).limit(5) the index is used, and only 5 documents are scanned. When I do find(a).sort(b,_id).limit(5) (with or without a hint) it appears the index is only used to match a - as all documents with the matching value for a are scanned (nscanned and nscannedObjects are high). The behavior I would expect is that it would scan through a and b values enough to get to the limit (5), and then it would stop scanning when it found the next unique b (or a). It would then sort on _id within the scanned records. |
| Comments |
| Comment by Michael Longo [ 25/Feb/22 ] | |||||||||||||||||||||||||||||||||||
|
If I may add, I'm looking to upgrade my current mongodb 4.2 (yes I know, I'm late) and in the 4.4 releases note it states:
Which is kind of a pain due to this ticket. I'll basically have to update my indexes for a few corner cases duplicates that could be solved with a quick in-memory sort | |||||||||||||||||||||||||||||||||||
| Comment by Michael Longo [ 08/Feb/22 ] | |||||||||||||||||||||||||||||||||||
|
The use case can be simplified a lot. We don't need a compound index or any filter. Just:
is enough to reproduce the "issue". Even though only 10 documents were asked, it ends up examining and sorting the whole 500 documents even though it would in reality require no sort at all in this very situation. And even if we had duplicated entries, it should really only sort these duplicates instead of the whole collection. This is really a bummer for something that appears so simple and intuitive. Just adding my 2 cents hopping this would get noticed and maybe prioritized? | |||||||||||||||||||||||||||||||||||
| Comment by David Storch [ 16/Aug/19 ] | |||||||||||||||||||||||||||||||||||
|
I'm changing the issuetype from "Bug" to "Improvement". Capturing the steps to reproduce, since they have a very helpful example of a query that would benefit from this suggested optimization:
| |||||||||||||||||||||||||||||||||||
| Comment by Asya Kamsky [ 16/May/18 ] | |||||||||||||||||||||||||||||||||||
|
This is not the same issue as the issue described in this ticket is having compound index which is used for both find and sort, but it's not used in the most efficient manner when there is a second field adding to the sort specification. Your issue is a known issue that two different indexes cannot be combined for find and for sort. If you observe that it works with one driver but not another, I believe something else is actually an issue and it would be easiest to debug if you open a new SERVER ticket for it (if it does not work via the shell and it should). If it works in one driver but not another then you should open a Jira ticket in the driver project (for the one it does not work in).
| |||||||||||||||||||||||||||||||||||
| Comment by Valentin Kuznetsov [ 23/Apr/18 ] | |||||||||||||||||||||||||||||||||||
|
For the record, I used MongoDB shell version v3.6.3 and MongoDB server version: 3.6.3 | |||||||||||||||||||||||||||||||||||
| Comment by Valentin Kuznetsov [ 23/Apr/18 ] | |||||||||||||||||||||||||||||||||||
|
Hi,
This will insert 1000000 docs. And, I am able to fetch the doc using my look-up query. But if I try to do that in mongo shell it fails:
Obviously the query used look-up index as well as index on sorted field. I don't think it is optimization issue, since I provided both indexed fields and rather an issue with search implementation. | |||||||||||||||||||||||||||||||||||
| Comment by Matt Kangas [ 04/Feb/15 ] | |||||||||||||||||||||||||||||||||||
|
This ticket is a request for a query optimization. Here is a brief outline what will be required to implement it. During query planning we determine if the requested sort is satisfied by an index. Today this is a binary decision. If yes, we can skip an explicit sort stage. If no, we choose a sort strategy and add it to the query plan. The optimization requested requires changing the answer from yes/no to yes/no/prefix. That is to say, the index partially satisfies the sort order. Later, during query execution, the sort stage would need to apply logic to stop consuming input documents from the index scan once the limit is reached, then proceed with a normal sort. This is a nontrivial amount of machinery to maintain and test. It is presently unclear that the result will be a significant win for many users, particularly when compared to other possible query optimizations. | |||||||||||||||||||||||||||||||||||
| Comment by David Hows [ 22/Aug/13 ] | |||||||||||||||||||||||||||||||||||
|
Hi Tim, Thanks for bearing with me, I see what you're saying. There is potential for optimisation and we will give this further consideration. Thanks again, | |||||||||||||||||||||||||||||||||||
| Comment by Tim Haines [ 22/Aug/13 ] | |||||||||||||||||||||||||||||||||||
|
Hi David, If the query for a specifies one value, and the sort for b is the dominant sort option, then why can't the scanner just look at <limit> documents and then scan up to the next change in value of b? Can you provide a small dataset to illustrate why this doesn't work please? Thank you, Tim. | |||||||||||||||||||||||||||||||||||
| Comment by David Hows [ 19/Aug/13 ] | |||||||||||||||||||||||||||||||||||
|
Thanks Tim, Is the index still on {a:1,b:1}? If so, then the same reasoning above will still apply. Given you are sorting on {b:1, _id:1}then Mongo will need to bring in all matching documents to sort them correctly to return the correct 3 documents (given sort order) as there is no other way to perform the sort on {b:1,_id:1}using the {a:1,b:1}index. You would need to add an index which includes _id to sort correctly without scanning all matching documents; even with the limit. For instance {a:1, b:1, _id:1}would allow you to match and sort in this manner without scanning. Regards, | |||||||||||||||||||||||||||||||||||
| Comment by Tim Haines [ 13/Aug/13 ] | |||||||||||||||||||||||||||||||||||
|
Hi David, I'm sorry, I screwed up the steps to reproduce (and it looks like I can't edit them now?). The steps to reproduce should have shown a sort by {b:1}in the first example and {b:1, _id:1}in the example showing the problem. In the problem case Mongo should realize that after it's found the limit of documents request, it only needs to keep scanning until it finds a change in the value of b. In the case of the problem example but with a sort of {b:1, _id:1}, it should only need to scan 4 documents. | |||||||||||||||||||||||||||||||||||
| Comment by David Hows [ 12/Aug/13 ] | |||||||||||||||||||||||||||||||||||
|
Hi Tim, The issue in this is that MongoDB cannot assume that the _id values are sorted given the index of a,b. This means that when you ask for MongoDB to find three documents sorted by a and _id it must scan all 6 documents to confirm the sort order. Once it has confirmed which are the three relevant documents given sort order it will return a cursor to those three documents. This could be resolved by adding an index with the following definition:
With the explain result of:
Regards, |