[SERVER-13780] sparsePolicy for sparse compound indexes Created: 29/Apr/14 Updated: 17/Apr/15 Resolved: 05/Aug/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Index Maintenance |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | New Feature | Priority: | Major - P3 |
| Reporter: | Henrik Ingo (Inactive) | Assignee: | hari.khalsa@10gen.com |
| Resolution: | Duplicate | Votes: | 5 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Description |
|
This patch was initially posted as a comment to Support for sparse index with multiple fields is, as far as I can see, implemented in the current code base and has been for a long time (and with written tests). There is a uassert in the code that suppose to prevent a user from creating a sparse index with multiple fields but the implementation of this is wrong so it will never kick in. So, if a user creates a sparse index with multiple fields it will work. The semantics for the current implementation is; "exclude a document from the index if all index fields are missing from the document". This "mode" of the index might benefit some, but according to many of the wishes in the discussion in this issue and in https://jira.mongodb.org/browse/SERVER-785 the semantics folks are looking for is; "only include a document in the index if all index fields are present in the document". Getting support for this second "mode" of the index is simply a matter of changing numNotFound == _spec._nFields to numNotFound != 0 here: https://github.com/mongodb/mongo/blob/master/src/mongo/db/indexkey.cpp#L429 Provided that I have not missed any complicated corner case regarding this, I have the following suggestions: 1. Change the documentation so that it is clear that sparse index with multiple fields is supported. You can find the code/patch that does this (with test case) here: https://github.com/johanhedin/mongo/commits/SERVER-2193 With this patch you could create a index like this:
and only documents where both a and b are present will be included in the index. If sparsePolicy is left out (the default) the index will work as before. And of course, the name sparsePolicy is just an suggestion. I have only addressed v1 indexes but that same seem to be doable for v0 indexes as well if that is desired. I'm happy to create a pull request if this is something you would consider. For my use case, this would be a HUGE improvement since we are starting to scale from hundreds of millions of documents to hundreds of billions of documents and RAM usage for our indexes is a big issue costing a lot of money for hardware that just store "empty" values. |
| Comments |
| Comment by Henrik Ingo (Inactive) [ 05/Aug/14 ] |
|
SERVER-785 is a superset of this patch, and has now been scheduled into Planning Bucket A. |
| Comment by Henrik Ingo (Inactive) [ 29/Apr/14 ] |
|
To followup on Thomas comments Re: SERVER-785. To review "what do people really want", I sampled a few emails from mongodb-user and mongodb-dev. Some people of course really have ideas where arbitrary filtered index is needed. But it seems many people could actually benefit from the proposed {sparsePolicy : "allFields"}. For example: DaveC is proposing a solution based on "allFields" semantics. (Asya then corrects him that what he proposes isn't currently possible.) Albert Kam suggests a usage that as such would require filtering by value, but actually what he wants could be achieved with simple "allFields" semantics too. Clemo seems to ask for "allFields" semantics too. (So in this sample of 3, a 100% actually would be well served by this patch.) |
| Comment by Henrik Ingo (Inactive) [ 29/Apr/14 ] |
|
|