[SERVER-2193] Sparse indexes only support a single field Created: 08/Dec/10 Updated: 17/Apr/15 Resolved: 08/Jan/11 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Index Maintenance |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Scott Hernandez (Inactive) | Assignee: | Unassigned |
| Resolution: | Won't Fix | Votes: | 4 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
Sparse indexes doesn't support multiple fields, for compound indexes. If any field (in the index def) is missing, the document should not be in the index. |
| Comments |
| Comment by Jeremy Haile [ 09/Oct/14 ] | ||||||
|
I have the situation where one of my columns is null when I first create it, but may get set to an ID later. And when it's set to an ID it needs to be unique with another column. Unfortunately I can't enforce this using a unique index because it will fail since many rows may have null in one of the columns. If I were using a regular RDBMS with a sparse unique multi-column index, this would work fine. Unfortunately Mongo has chosen to work in a different way from all of the RDBMS' out there and cannot support this scenario. Given that partial indexes are not a quick thing to add and don't seem like they will be added anytime soon, why is this issue closed? Please reopen and consider implementing this issue. | ||||||
| Comment by Adam Wróbel [ 25/Jan/14 ] | ||||||
|
This feature is really important for unique indexes. Without this you cannot guarantee uniqueness of a field within a category if that field could be left blank. Think {customer_id: 1, external_id: 1} where customer submitting the document can optionally provide his own external_id that should be unique among his documents. Currently this would generate duplicate key error if he decided not to provide his external_id for multiple documents. This feature is popular among RDBMS. Read this quote from MySql documentation:
http://dev.mysql.com/doc/refman/5.7/en/create-index.html I know that this could be solved by introducing partial indexes (if they allowed enforcement of uniqueness), but that feature seems a long way to come while I imagine this one could be implemented fairly quickly and painlessly with a special option. | ||||||
| Comment by Darren Kramer [ 02/Aug/13 ] | ||||||
|
Would it make sense to have the logic not be restricted to an "all" or "any" but based on the left most field. So with an {x:1,y:1}index, the record is included if x is included and not included if x is not included. The inclusion of y would make no difference. In that way the db.test.count( {x:1}) could still use the sparse index as could any queries on the leading columns and you don't have to either put any restrictions on the optimizer using sparse indexes for uncovered queries yet still retain most of the objectives. Of course the major downside here is only the first field would be considered, which restricts not only the order of the index fields but restricts the number of fields that can be missing in a unique index to just one. But I would have to imagine in many cases that would be sufficient. Maybe it could be a third option for sparePolicy. So something like "all", "any" or "first". Just a thought. | ||||||
| Comment by Johan Hedin [ 10/Jun/13 ] | ||||||
|
Robert, you are so right! In the same way that a spares index is disallowed for $exists: false queries, a sparse "include" index should probably be disallowed if not all index fields is present in the query. I don't understand the query optimizer well enough for trying that out though. I guess sparse indexes are tricky when it comes to how they should work in a general way. Should be interesting to see how this is going to be solved for filtered indexes! | ||||||
| Comment by Robert Dwingle [ 07/Jun/13 ] | ||||||
|
Johan, that's a great idea with sparsePolicy, thanks for making this.
Here's a patch to your test case that covers this: http://hastebin.com/qesuvifutu To make this work properly, I guess, you would also want to adjust the way mongodb chooses indexes while serving a request. In particular, indexes with sparsePolicy: "include" should only be used if request conditions contain all fields covered by the index. | ||||||
| Comment by Johan Hedin [ 21/Apr/13 ] | ||||||
|
I know that this issue is closed and I know about the filtered index issue in https://jira.mongodb.org/browse/SERVER-785 but lets try anyway. Support for sparse index with multiple fields is, as far as I can see, implemented in the current code base and has been for a long time (and with written tests). There is a uassert in the code that suppose to prevent a user from creating a sparse index with multiple fields but the implementation of this is wrong so it will never kick in. So, if a user creates a sparse index with multiple fields it will work. The semantics for the current implementation is; "exclude a document from the index if all index fields are missing from the document". This "mode" of the index might benefit some, but according to many of the wishes in the discussion in this issue and in https://jira.mongodb.org/browse/SERVER-785 the semantics folks are looking for is; "only include a document in the index if all index fields are present in the document". Getting support for this second "mode" of the index is simply a matter of changing numNotFound == _spec._nFields to numNotFound != 0 here: https://github.com/mongodb/mongo/blob/master/src/mongo/db/indexkey.cpp#L429 Provided that I have not missed any complicated corner case regarding this, I have the following suggestions: 1. Change the documentation so that it is clear that sparse index with multiple fields is supported. So, this is basically what is already suggested by @Keith and @Scott above. Nothing new. But, since almost all of the coding is already in there this could be in 2.6 with almost no effort. Even though you would be able to achieve the same with filtered indexes, that issue is in "Planning Bucket B" so it will be a long time before that will get into the code. In the meantime, a lot of users could benefit from this "simpler" form of filtered indexes, including me. You can find the code/patch that does this (with test case) here: https://github.com/johanhedin/mongo/commits/SERVER-2193 With this patch you could create a index like this:
and only documents where both a and b are present will be included in the index. If sparsePolicy is left out (the default) the index will work as before. And of course, the name sparsePolicy is just an suggestion. I have only addressed v1 indexes but that same seem to be doable for v0 indexes as well if that is desired. I'm happy to create a pull request if this is something you would consider. For my use case, this would be a HUGE improvement since we are starting to scale from hundreds of millions of documents to hundreds of billions of documents and RAM usage for our indexes is a big issue costing a lot of money for hardware that just store "empty" values. I can step up and write the documentation for this as well. | ||||||
| Comment by Vinsen Mego [ 26/Mar/13 ] | ||||||
|
I would like to vote for this, may be this feature can be reconsidered. Bellow are my case: ).sort( {viewed:-1}).limit(10); The collection has 50million documents, bellow are indexes I tried : , {sparse:true} ); 2. db.photo.ensureIndex( {hashtag:1, viewed:-1}, {sparse:true} ); | ||||||
| Comment by Eliot Horowitz (Inactive) [ 21/Feb/12 ] | ||||||
|
@roy - a filtered index would cover this case as you can filter otu thigns where song_id was null and then make it unique | ||||||
| Comment by Roy Smith [ 21/Feb/12 ] | ||||||
|
Here's a use case (I think) for this. I've got a collection where every document has song_id and station_id fields. Some small percentage of the documents also have session_id fields. I want to make sure that if there is a session_id, the (session_id, song_id, station_id) triple is unique. Trying to create the unique sparse index fails: > db.song_votes.ensureIndex( {session_id:1, station_id:1, song_id:1}, {unique:true, sparse:true}) I don't fully understand the filtered index proposal, so it's posible that would work for this. | ||||||
| Comment by Eliot Horowitz (Inactive) [ 31/Dec/11 ] | ||||||
|
@thilo - a filtered index would handle this. | ||||||
| Comment by Thilo Planz [ 29/Dec/11 ] | ||||||
|
"It makes more sense to exclude documents from the index if ALL fields in the index are missing." Can we re-open this? Sparse multi-field indexes are not exactly a duplicate of filtered indexes, I think. I have the following situation: A file system, every file is a MongoDB document, with parentId and filename. The combination of {parentId + filename}needs to be unique. However, there are also multiple "file system roots", that do not have a parentId or a filename. Because the unique index cannot be made sparse right now, I cannot have multiple of these roots (my current workaround: leave just the parentId blank, and give them a dummy unique filename). Or would filtered indexes cover this? I.e. could I have a filter that says "parentId not null"? In this case, we can leave this one closed (or rather, close it once we do get filtered indexes). | ||||||
| Comment by Scott Hernandez (Inactive) [ 18/Jan/11 ] | ||||||
|
If we had filtered indexes I don't think we need sparse indexes, or this; Since we don't, this is useful. | ||||||
| Comment by Justin Dearing [ 18/Jan/11 ] | ||||||
|
My gut instinct was to vote on this. However, I don't see a need for this if I have a filtered index. In my case of compound inddex the index was a unique index on {documentReference, emailAddress}in a registration collection. Users that were invited, but not registered did not have the document field filled out. Another partial index for the unregistered users could cover them for when I confirm invitees. | ||||||
| Comment by Eliot Horowitz (Inactive) [ 08/Jan/11 ] | ||||||
|
I agree. | ||||||
| Comment by Keith Branton [ 08/Jan/11 ] | ||||||
|
I've been thinking about this some more and realize there's a huge flaw in my earlier ideas. (and I'm now wondering how rdbms' typically handle this - but am struggling to find any implementation details) I propose we drop this jira in favor of Scott's http://jira.mongodb.org/browse/SERVER-785 which would seem to cover all the bases nicely. I'm moving my vote from here to there. | ||||||
| Comment by Scott Hernandez (Inactive) [ 09/Dec/10 ] | ||||||
|
I could imagine the sparse options could be "Any"/"All" (not just "true" as it now). http://jira.mongodb.org/browse/SERVER-785 – Filtered indexes could address this sparse index needs completely. If the user could specify the rules for inclusion in the index then it is very clear. Also, as much as fields should be used in left-to-right order (or combination), it isn't exactly true. The query optimizer may get smarter and start to choose indexes which don't fit that rule. Already you can hint and it will use an index where you don't supply values for the first field in the index. I think "sparse" is a misleading name because it doesn't describe the way in which the index is sparse. "skipDocsWithMissingFields" says more what it does, but isn't very succinct. | ||||||
| Comment by Keith Branton [ 08/Dec/10 ] | ||||||
|
It makes more sense to exclude documents from the index if ALL fields in the index are missing. Compound indexes also serve queries on the first, first+second, etc fields in the index, and so an index on a,b,c should be able to find all the documents where a=1, not only the ones where b and/or c also have values. This is more intuitive, and should be the default behavior. That is what http://jira.mongodb.org/browse/SERVER-484 suggested and what quite a few voters seem to want. I can see Scott's point too - for polymorphic collections his suggestion really makes a lot of sense. Maybe a new option could be passed at index creation time... db.people.ensureIndex( {title : 1, name : 1}, {sparse : true, sparseIfAnyValueMissing : true}) hopefully someone can think of a better name for that option though | ||||||
| Comment by Scott Hernandez (Inactive) [ 08/Dec/10 ] | ||||||
|
A common use case is where you have multiple document types in a single collection. I want to create an index that only applies to one of the documents types (which have certain fields which I want to index). I might also want to create an index if the first (N) fields exist, as well. This case might be addressed with filtered/partial index support. | ||||||
| Comment by Eliot Horowitz (Inactive) [ 08/Dec/10 ] | ||||||
|
I thought about that, but I'm not sure those semantics make sense. Do you have a use case? |