[SERVER-16042] Optimise $all/$and to select smallest subset as initial index bounds Created: 10/Nov/14  Updated: 07/Sep/16  Resolved: 01/Oct/15

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: 2.6.12, 3.0.7, 3.1.9

Type: Improvement Priority: Major - P3
Reporter: David Hows Assignee: David Storch
Resolution: Done Votes: 4
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-19999 $all with $elemMatch indexes first el... Closed
is duplicated by SERVER-20913 DateTime index ignored / horrible per... Closed
is duplicated by SERVER-3892 consider multiple candidate index sca... Closed
is duplicated by SERVER-12499 Unable to force predicate evaluation ... Closed
Related
related to SERVER-14264 Compound index on 2dsphere and dateti... Closed
related to SERVER-12281 When choosing multikey index bounds, ... Backlog
related to DOCS-8779 $all now choose optimal condition not... Closed
related to SERVER-18364 Ensure non-negation predicates get ch... Closed
is related to SERVER-1000 $all with query optimizer Closed
Backwards Compatibility: Fully Compatible
Backport Completed:
Sprint: Quint Iteration 3, QuInt A (10/12/15)
Participants:

 Description   

Currently clauses in a $all will be evaluated in the order that they are provided to the query. This can lead to cases where a superset is being searched to find values that are contained within a smaller subset of the wanted values.

Consider trying to find all documents that have tags showing that they are an Italian and have a Michelin star within a collection of restaurants. There will be a large number of documents with the tag "Italian" but very few with the tag "Michelin".

If the tags are entered into the $all clause in alphabetical order, the search will take longer as the larger set of "Italian" documents must be searched again for the "Michelin" tag. If the tag order is reversed, then we search the smaller set of "Michelin" matching documents to find those who match "Italian".

The optimisation here would be to remove the dependency on the order of documents in an $all clause.

Changes in this case should handle situations where the entities within the $all clause are not simply fields, but also documents which may have several levels of depth.



 Comments   
Comment by Githook User [ 18/Nov/15 ]

Author:

{u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

Message: SERVER-16042 generate a candidate plan for each predicate over the leading field of a multikey index
Branch: v2.6
https://github.com/mongodb/mongo/commit/461c5d496e9e1896be4371478867117ec552d7a5

Comment by Githook User [ 01/Oct/15 ]

Author:

{u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

Message: SERVER-16042 fix lint
Branch: v3.0
https://github.com/mongodb/mongo/commit/ea38cdc9c2aa753d7720913fe92d1763bfb1763e

Comment by Githook User [ 01/Oct/15 ]

Author:

{u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

Message: SERVER-16042 generate a candidate plan for each predicate over the leading field of a multikey index

(cherry picked from commit 4372e7b826bbaff94d995c0a37d9a8bf29378e48)

Conflicts:
src/mongo/db/query/query_planner_array_test.cpp
src/mongo/db/query/query_planner_geo_test.cpp
src/mongo/db/query/query_planner_test.cpp
Branch: v3.0
https://github.com/mongodb/mongo/commit/e4964e0b77ee3d055946fb88bf37338a6a1906f1

Comment by Githook User [ 01/Oct/15 ]

Author:

{u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

Message: SERVER-16042 generate a candidate plan for each predicate over the leading field of a multikey index
Branch: master
https://github.com/mongodb/mongo/commit/4372e7b826bbaff94d995c0a37d9a8bf29378e48

Comment by David Storch [ 26/Jan/15 ]

One more note: the query planner rewrites an $all to an $and. Everything described here for $all also applies to an $and.

Comment by David Storch [ 14/Nov/14 ]

Hi petrbela,

Great question. We closed SERVER-1000 at a time when the thinking was that index intersection would fully resolve this issue. However, we have since realized that index intersection is not sufficient. This ticket tracks our work on a more comprehensive fix, so please continue to watch for updates.

Here's why index intersection is not necessarily the answer for your "halloween" and "intercontinental" example. Currently, the query planner will consider two plans using the index on tags:

  1. Fetch all 4 million documents with the "halloween" tag, then filter by "intercontinental".
  2. Sort-based index intersection of the documents with "halloween" and the documents with "intercontinental".

The sort-based index intersection plan requires scanning 4 million "halloween" keys plus 4 thousand "intercontinental" keys, and also incurs some CPU overhead as it computes the intersection between these sets of keys. Although this is possibly cheaper than the plan that only scans "halloween" keys, it still can do a lot of unnecessary scanning.

The planner currently fails to generate and rank a plan which fetches the 4 thousand documents with the "intercontinental" tag and then filters by "halloween". This is probably the best plan, but it is never even considered. One of a few fixes that have been discussed by our dev team is to simply generate and rank additional plans, in this case ranking the plan that scans "intercontinental" keys in addition to the one that scans "halloween" keys.

Best,
Dave

Comment by Petr Bela [ 14/Nov/14 ]

We've just run into the same issue.

db.assets.find({ tags: { $all: ['halloween', 'intercontinental'] } })

fetches all 'halloween' documents (about 4 million in our case) and then scans them for 'intercontinental' (around 4000).

I was wondering if the index intersection was supposed to fix that? This comment https://jira.mongodb.org/browse/SERVER-1000?focusedCommentId=473980&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-473980 seems to imply that it has been resolved, but as of 2.6.5, it still behaves as described above. Or is index intersection not yet (ever?) supported for array fields?

Generated at Thu Feb 08 03:39:46 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.