Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-13780

sparsePolicy for sparse compound indexes

    • Type: Icon: New Feature New Feature
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Index Maintenance
    • Labels:
      None

      This patch was initially posted as a comment to SERVER-2193 but that ticket is on a bit of a different topic, so we discussed with thomasr that it should rather be made an independent ticket. Original comment follows:

      Support for sparse index with multiple fields is, as far as I can see, implemented in the current code base and has been for a long time (and with written tests). There is a uassert in the code that suppose to prevent a user from creating a sparse index with multiple fields but the implementation of this is wrong so it will never kick in.

      So, if a user creates a sparse index with multiple fields it will work. The semantics for the current implementation is; "exclude a document from the index if all index fields are missing from the document".

      This "mode" of the index might benefit some, but according to many of the wishes in the discussion in this issue and in https://jira.mongodb.org/browse/SERVER-785 the semantics folks are looking for is; "only include a document in the index if all index fields are present in the document".

      Getting support for this second "mode" of the index is simply a matter of changing numNotFound == _spec._nFields to numNotFound != 0 here: https://github.com/mongodb/mongo/blob/master/src/mongo/db/indexkey.cpp#L429

      Provided that I have not missed any complicated corner case regarding this, I have the following suggestions:

      1. Change the documentation so that it is clear that sparse index with multiple fields is supported.
      2. Add a additional config parameter that can be used together with the sparse: true option to flip the behavior of the index according to the second semantics above.

      You can find the code/patch that does this (with test case) here: https://github.com/johanhedin/mongo/commits/SERVER-2193

      With this patch you could create a index like this:

      db.collection.ensureIndex({ a: 1, b: 1 }, { sparse: true, sparsePolicy: "include" })
      

      and only documents where both a and b are present will be included in the index. If sparsePolicy is left out (the default) the index will work as before. And of course, the name sparsePolicy is just an suggestion.

      I have only addressed v1 indexes but that same seem to be doable for v0 indexes as well if that is desired.

      I'm happy to create a pull request if this is something you would consider. For my use case, this would be a HUGE improvement since we are starting to scale from hundreds of millions of documents to hundreds of billions of documents and RAM usage for our indexes is a big issue costing a lot of money for hardware that just store "empty" values.

            Assignee:
            hari.khalsa@10gen.com hari.khalsa@10gen.com
            Reporter:
            henrik.ingo@mongodb.com Henrik Ingo (Inactive)
            Votes:
            5 Vote for this issue
            Watchers:
            12 Start watching this issue

              Created:
              Updated:
              Resolved: