[SERVER-1000] $all with query optimizer Created: 12/Apr/10  Updated: 02/Mar/18  Resolved: 20/Dec/13

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: 2.5.5

Type: Bug Priority: Major - P3
Reporter: Aaron Staple Assignee: hari.khalsa@10gen.com
Resolution: Done Votes: 44
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-2617 Indexing arrays and embedded JSON obj... Closed
Related
related to SERVER-5331 Use heuristics to choose selective in... Backlog
related to SERVER-16042 Optimise $all/$and to select smallest... Closed
is related to SERVER-3071 Index Intersection Closed
Participants:

 Description   

Right now with an $all query we just look up the first field in the index. We could instead try out all/some number of fields using the query optimizer.



 Comments   
Comment by hari.khalsa@10gen.com [ 20/Dec/13 ]

We handle this via index intersection.

Comment by Antoine Girbal [ 25/Apr/13 ]

Was just thinking about this: since we now have much faster count() operation on a clean range, why not use it to improve $all?
Mongo can first iterate over elements of $all and if it can make use of fast count then it should check which one returns fewer elements.
Most of the time matches in a $all are equalities on fields covered by index, so the optimization will be available.

Comment by auto [ 29/Oct/12 ]

Author:

{u'date': u'2012-10-29T13:33:08-07:00', u'name': u'Sam Kleinman', u'email': u'samk@10gen.com'}

Message: merge: SERVER-1000
Branch: master
https://github.com/mongodb/docs/commit/d5aa64d9ab19f10097255a3df7b4c07175641bb3

Comment by auto [ 29/Oct/12 ]

Author:

{u'date': u'2012-10-26T03:27:48-07:00', u'name': u'giveturtle', u'email': u'yanir@giveable.co'}

Message: Added comment about $all's inefficiency

See issue https://jira.mongodb.org/browse/SERVER-1000
Branch: master
https://github.com/mongodb/docs/commit/14fe6b73933026825d36b6699f987150b6b33214

Comment by DisU [ 26/Oct/12 ]

Why isn't this bug part of the Mongo documentation? It's been known for over a year and a half, and since it's not documented I just assumed that doing $all queries on indexed fields would be fast because the most trivial way of implementing it is by doing index intersection.

It would be better to mark $all as not implemented, because that's its current status. It's completely useless, as there are absolutely no guarantees it would ever finish (I mean ever in the web sense of user requests timing out).

Comment by Sam Martin [ 10/Oct/12 ]

I've done exactly that, works ok, until you want to search part of a term.
see https://groups.google.com/forum/?fromgroups=#!topic/mongodb-user/xpUiCmsedPY

Comment by yudi.zhao [ 08/Sep/11 ]

So glad that someone has already raised this issue, that bother me for quiet a time.
I just planned to build my own tag dictionary holding the approximate counts.
Would this be in v2.1?

Comment by Eliot Horowitz (Inactive) [ 19/Apr/11 ]

We should really just do an index intersection

Comment by Vicente Mundim [ 14/Mar/11 ]

I also run into this issue. I have a collection of videos with a tags array attribute. Here is the output of a query using a tag which has lots of video and another one with just a few:

> db.videos.find({tags: {$all: ['jornalismo', 'Atenas']}}).explain()
{
"cursor" : "BtreeCursor tags_1",
"nscanned" : 266996,
"nscannedObjects" : 266996,
"n" : 54,
"millis" : 4938,
"indexBounds" : {
"tags" : [
[
"jornalismo",
"jornalismo"
]
]
}
}
> db.videos.find({tags: {$all: ['Atenas', 'jornalismo']}}).explain()
{
"cursor" : "BtreeCursor tags_1",
"nscanned" : 94,
"nscannedObjects" : 94,
"n" : 54,
"millis" : 2,
"indexBounds" : {
"tags" : [
[
"Atenas",
"Atenas"
]
]
}
}

Here is the number of video documents for each tag:

> db.videos.find(

{tags: 'jornalismo'}

).count()
266996
> db.videos.find(

{tags: 'Atenas'}

).count()
94

It uses only the first tag when searching through the indexes. It would be better if it used the tag with fewer documents, wouldn't?

Comment by Matthias Götzke [ 19/Jan/11 ]

this is basically where a statistics feature would come in handy. The statistics for this speficically might be gained from 10gen's efforts of implementing full-text search because the indices used there would give you a statistic for search term frequency.

Comment by Gerad Suyderhoud [ 18/Jan/11 ]

Just ran into this today. It's especially annoying to code around because we don't know the cardinality of the terms when doing the query, but mongo does.

> db.contacts.find({ search:

{ $all: [ "chris", "olsen" ] }

}).explain()
{
"cursor" : "BtreeCursor search_1_type_1_recent_connection_date-1",
"nscanned" : 1378,
"nscannedObjects" : 1378,
"n" : 6,
"millis" : 722,
"indexBounds" : { ...

> db.contacts.find({ search:

{ $all: [ "olsen", "chris" ] }

}).explain()
{
"cursor" : "BtreeCursor search_1_type_1_recent_connection_date-1",
"nscanned" : 32,
"nscannedObjects" : 32,
"n" : 6,
"millis" : 4,
"indexBounds" : { ...

Generated at Thu Feb 08 02:55:46 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.