[SERVER-16042] Optimise $all/$and to select smallest subset as initial index bounds Created: 10/Nov/14 Updated: 07/Sep/16 Resolved: 01/Oct/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Querying |
| Affects Version/s: | None |
| Fix Version/s: | 2.6.12, 3.0.7, 3.1.9 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | David Hows | Assignee: | David Storch |
| Resolution: | Done | Votes: | 4 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||||||||||||||
| Backport Completed: | |||||||||||||||||||||||||||||||||||||||||||||
| Sprint: | Quint Iteration 3, QuInt A (10/12/15) | ||||||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
Currently clauses in a $all will be evaluated in the order that they are provided to the query. This can lead to cases where a superset is being searched to find values that are contained within a smaller subset of the wanted values. Consider trying to find all documents that have tags showing that they are an Italian and have a Michelin star within a collection of restaurants. There will be a large number of documents with the tag "Italian" but very few with the tag "Michelin". If the tags are entered into the $all clause in alphabetical order, the search will take longer as the larger set of "Italian" documents must be searched again for the "Michelin" tag. If the tag order is reversed, then we search the smaller set of "Michelin" matching documents to find those who match "Italian". The optimisation here would be to remove the dependency on the order of documents in an $all clause. Changes in this case should handle situations where the entities within the $all clause are not simply fields, but also documents which may have several levels of depth. |
| Comments |
| Comment by Githook User [ 18/Nov/15 ] | |
|
Author: {u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}Message: | |
| Comment by Githook User [ 01/Oct/15 ] | |
|
Author: {u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}Message: | |
| Comment by Githook User [ 01/Oct/15 ] | |
|
Author: {u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}Message: (cherry picked from commit 4372e7b826bbaff94d995c0a37d9a8bf29378e48) Conflicts: | |
| Comment by Githook User [ 01/Oct/15 ] | |
|
Author: {u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}Message: | |
| Comment by David Storch [ 26/Jan/15 ] | |
|
One more note: the query planner rewrites an $all to an $and. Everything described here for $all also applies to an $and. | |
| Comment by David Storch [ 14/Nov/14 ] | |
|
Hi petrbela, Great question. We closed Here's why index intersection is not necessarily the answer for your "halloween" and "intercontinental" example. Currently, the query planner will consider two plans using the index on tags:
The sort-based index intersection plan requires scanning 4 million "halloween" keys plus 4 thousand "intercontinental" keys, and also incurs some CPU overhead as it computes the intersection between these sets of keys. Although this is possibly cheaper than the plan that only scans "halloween" keys, it still can do a lot of unnecessary scanning. The planner currently fails to generate and rank a plan which fetches the 4 thousand documents with the "intercontinental" tag and then filters by "halloween". This is probably the best plan, but it is never even considered. One of a few fixes that have been discussed by our dev team is to simply generate and rank additional plans, in this case ranking the plan that scans "intercontinental" keys in addition to the one that scans "halloween" keys. Best, | |
| Comment by Petr Bela [ 14/Nov/14 ] | |
|
We've just run into the same issue.
fetches all 'halloween' documents (about 4 million in our case) and then scans them for 'intercontinental' (around 4000). I was wondering if the index intersection was supposed to fix that? This comment https://jira.mongodb.org/browse/SERVER-1000?focusedCommentId=473980&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-473980 seems to imply that it has been resolved, but as of 2.6.5, it still behaves as described above. Or is index intersection not yet (ever?) supported for array fields? |