Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-2675

Multikeys (indexing keys in a hash) for ranged queries

    • Query Optimization

      Mongo currently supports multikey indexing (where we index values in an array). I suggest a feature for specifying an index on the keys and values of a given hash. This would be synonymous to the common property bag pattern which is sometimes implemented in SQL datastores.

      properties (table)
      ==============
      id (int, PK)
      entity_id (int, FK)
      key (varchar)
      value (varchar)

      Sample data (date of birth for several different entities);
      INSERT INTO properties (entity_id, key, value) VALUES (101, 'dob', '12/15/1980')
      INSERT INTO properties (entity_id, key, value) VALUES (276, 'dob', '11/05/1973')
      INSERT INTO properties (entity_id, key, value) VALUES (333, 'dob', '11/05/1944')

      Index: key, value

      With an index in place we can find everyone born in the 70's.

      In Mongo, to accomplish the same you have to add an index directly to the 'dob' attribute of the document. This is undesirable as our goal is to allow our documents (of any type) to share the same collection. (This is ideal for topic maps and social CMSes and many other use case, I'm sure.) It would be more useful if we could instead index the keys/values of a hash and perform ranged queries against those keys. Any property added to the property bag would be indexed. Powerful!

      INSERT INTO properties (entity_id, key, value) VALUES (101, 'iq', 83)
      INSERT INTO properties (entity_id, key, value) VALUES (276, 'iq', 120)
      INSERT INTO properties (entity_id, key, value) VALUES (333, 'iq', 103)

      Now we've added "IQ" to our entities. We can just as easily perform a greater-than query to grab the smartest people from our collection of whatevers.

      I believe this approach is superior to the current multikey approach suggested on:

      http://www.mongodb.org/display/DOCS/Using+Multikeys+to+Simulate+a+Large+Number+of+Indexes

      The future seems to be all about schema-less DBs. Indexing a hash of keys/values seems totally in tune with what it means to be schema-less. Also, you should be able to use hash indexing as part of a compound index. This would be especially important as one of the indexables would probably be type.

      db.whatevers.index({type: 1, attributes:

      { $hidx: 1}

      ) // $hidx = hash (key/value) index
      db.whatevers.find({'type': 'person', 'attributes.iq': {$gt: 100}}) //uses index (not scan)
      db.whatevers.find({'type': 'person', 'attributes.dob': {$gte: '1/1/1970', $lte: '12/31/1970'}}) //uses the same index

      The document might look like:
      {
      type: "Person",
      attributes:

      { name: "Jason", dob: '12/01/1950', iq: 99 }

      }

      Attributes expressed in this manner seems more natural than:

      {
      type: "Person",
      attributes: [

      {name: "Jason"}

      ,

      {dob: '12/01/1950'}

      ,

      {iq: 99}

      ]
      }

      Futhermore, as far as I can tell, the multikey style doesn't support ranged queries.

      Thanks for reading. MongoDB is a great work!

            Assignee:
            backlog-query-optimization [DO NOT USE] Backlog - Query Optimization
            Reporter:
            mlanza Mario T. Lanza
            Votes:
            3 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: