[SERVER-2617] Indexing arrays and embedded JSON objects - use cardinality in query on multiple elements instead of first element in the match criteria (e.g. $all operator) Created: 23/Feb/11  Updated: 15/Feb/13  Resolved: 29/Oct/12

Status: Closed
Project: Core Server
Component/s: Index Maintenance
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Aleksei Tolompoiko Assignee: Unassigned
Resolution: Duplicate Votes: 5
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-1000 $all with query optimizer Closed
Participants:

 Description   

Related to this discussion (long, look at the end for most of the substance):

https://groups.google.com/d/topic/mongodb-user/mr1uz9PuEek/discussion

Currently, MongoDB uses the first element in the $all operator to find the index and return the documents that are then scanned for the matches for the rest of the match criteria.
Here is an example of the documents in the collection and a query, as well as the explain info:

> r.ensureIndex(

{records:1}

)
> r.find({records:{ $all:[

{custid:456}

,

{status:200}

] }})
{ "_id" : ObjectId("4d6574f485904ac21ab43fca"), "bytes" : 10, "id" : 1, "records" : [

{ "status" : 200 }

,

{ "url" : "yahoo.com" }

,

{ "custid" : 456 }

] }
{ "_id" : ObjectId("4d6574fa85904ac21ab43fcb"), "bytes" : 10, "id" : 1, "records" : [

{ "status" : 200 }

,

{ "url" : "cnn.com" }

,

{ "custid" : 456 }

] }

> r.find({records:{ $all:[

{custid:456}

,

{status:200}

] }}).explain()
{
"cursor" : "BtreeCursor records_1",
"nscanned" : 4,
"nscannedObjects" : 4,
"n" : 2,
"millis" : 0,
"indexBounds" : {
"records" : [
[

{ "custid" : 456 }

,

{ "custid" : 456 }

]
]
}
}

According to how it was explained, the index used in this case (and similar queries) is always that of the first element of the query match array ($all), in this case on "custid" field, and then the documents returned from that index are scanned for matching of the rest of the query match terms (status:200 in this case). Cardinality of the indexes of the indexes is not taken into account when selecting which index to apply for the query - it is always just the index of the first field specified in $all.

A couple suggestions

  • evaluate all the query fields and select the most effective index (e.g. based on cardinality) instead of always the 1st element index.
  • use indexes for each step of the evaluation, i.e. after returning the first set of documents use index again to match against "status":200 instead of performing the document scan for all subsequent elements.

It would greatly enhance the effectiveness of performing queries on arbitrary number of embedded array elements.



 Comments   
Comment by Aaron Staple [ 29/Oct/12 ]

This feature is covered by SERVER-1000.

Generated at Thu Feb 08 03:00:33 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.