Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-2617

Indexing arrays and embedded JSON objects - use cardinality in query on multiple elements instead of first element in the match criteria (e.g. $all operator)

    • Type: Icon: Improvement Improvement
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Index Maintenance
    • None

      Related to this discussion (long, look at the end for most of the substance):

      https://groups.google.com/d/topic/mongodb-user/mr1uz9PuEek/discussion

      Currently, MongoDB uses the first element in the $all operator to find the index and return the documents that are then scanned for the matches for the rest of the match criteria.
      Here is an example of the documents in the collection and a query, as well as the explain info:

      > r.ensureIndex(

      {records:1}

      )
      > r.find({records:{ $all:[

      {custid:456}

      ,

      {status:200}

      ] }})
      { "_id" : ObjectId("4d6574f485904ac21ab43fca"), "bytes" : 10, "id" : 1, "records" : [

      { "status" : 200 }

      ,

      { "url" : "yahoo.com" }

      ,

      { "custid" : 456 }

      ] }
      { "_id" : ObjectId("4d6574fa85904ac21ab43fcb"), "bytes" : 10, "id" : 1, "records" : [

      { "status" : 200 }

      ,

      { "url" : "cnn.com" }

      ,

      { "custid" : 456 }

      ] }

      > r.find({records:{ $all:[

      {custid:456}

      ,

      {status:200}

      ] }}).explain()
      {
      "cursor" : "BtreeCursor records_1",
      "nscanned" : 4,
      "nscannedObjects" : 4,
      "n" : 2,
      "millis" : 0,
      "indexBounds" : {
      "records" : [
      [

      { "custid" : 456 }

      ,

      { "custid" : 456 }

      ]
      ]
      }
      }

      According to how it was explained, the index used in this case (and similar queries) is always that of the first element of the query match array ($all), in this case on "custid" field, and then the documents returned from that index are scanned for matching of the rest of the query match terms (status:200 in this case). Cardinality of the indexes of the indexes is not taken into account when selecting which index to apply for the query - it is always just the index of the first field specified in $all.

      A couple suggestions

      • evaluate all the query fields and select the most effective index (e.g. based on cardinality) instead of always the 1st element index.
      • use indexes for each step of the evaluation, i.e. after returning the first set of documents use index again to match against "status":200 instead of performing the document scan for all subsequent elements.

      It would greatly enhance the effectiveness of performing queries on arbitrary number of embedded array elements.

            Assignee:
            Unassigned Unassigned
            Reporter:
            alexatl Aleksei Tolompoiko
            Votes:
            5 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: