[SERVER-3779] De-dupe index keys inside of a btree bucket Created: 07/Sep/11 Updated: 26/May/15 Resolved: 26/May/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Index Maintenance |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | New Feature | Priority: | Major - P3 |
| Reporter: | Mathias Stearn | Assignee: | Unassigned |
| Resolution: | Won't Fix | Votes: | 6 |
| Labels: | pull-request | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Participants: |
| Description |
|
Because of the way we store indexes it should be easy to have multiple keynodes within a btree bucket pointing to the same data offset. This should result in much smaller indexes for things like "tags" where there are a lot of duplicate keys. |
| Comments |
| Comment by Mathias Stearn [ 26/May/15 ] | ||||||||||||||||||||||||||||||||||||||||
|
This optimization is handled in WT in a more general way as prefix_compression which we enable by default. We have no plans to do this optimization for MMAPv1 indexes. | ||||||||||||||||||||||||||||||||||||||||
| Comment by Guanqun Lu [ 16/Dec/11 ] | ||||||||||||||||||||||||||||||||||||||||
|
@mathias, after Aaron's review, I've fixed the core logic in btree, at least Aaron's case works, but what it lacks are more tests to cover my code path... In case there are some corner cases. My modification in rebalancing code uses the idea "do my best to balance", because the size calcuation is so different with with the newly added de-dup code. Sorry for being absent for a long time, it's around the end of the year and my daily job just gets much more busier. btw: what's the release date for 2.2 ? | ||||||||||||||||||||||||||||||||||||||||
| Comment by Mathias Stearn [ 14/Dec/11 ] | ||||||||||||||||||||||||||||||||||||||||
|
@Guanqun, Have you made any progress? I'd really like to see this feature make 2.2. | ||||||||||||||||||||||||||||||||||||||||
| Comment by Guanqun Lu [ 09/Nov/11 ] | ||||||||||||||||||||||||||||||||||||||||
|
Hi Aaron, Thanks for your review, good catch for the setKey function. I've done simple fixes to item 1 and item 4. And I see your point of this test case, yes, the size calculation in rebalancing is too naive and is not correct, I'll fix this later and add more test cases. Then I'll ask you for another review. Thanks! | ||||||||||||||||||||||||||||||||||||||||
| Comment by Aaron Staple [ 08/Nov/11 ] | ||||||||||||||||||||||||||||||||||||||||
|
PS Thanks for your work so far! | ||||||||||||||||||||||||||||||||||||||||
| Comment by Aaron Staple [ 08/Nov/11 ] | ||||||||||||||||||||||||||||||||||||||||
|
Hi - here are some more todos. Can take another look if these are taken care of.
| ||||||||||||||||||||||||||||||||||||||||
| Comment by Guanqun Lu [ 07/Nov/11 ] | ||||||||||||||||||||||||||||||||||||||||
|
During investigation of the above issue, I found a bug about update and covered index. please see | ||||||||||||||||||||||||||||||||||||||||
| Comment by Guanqun Lu [ 07/Nov/11 ] | ||||||||||||||||||||||||||||||||||||||||
|
I've fixed this issue. ps: the example Aaron pointed out has been added to jstests.
Thanks! | ||||||||||||||||||||||||||||||||||||||||
| Comment by Guanqun Lu [ 04/Nov/11 ] | ||||||||||||||||||||||||||||||||||||||||
|
Hi Eliot, Thanks for the clarification, then I'll make insert as fast as the current implementation and have to let pack related operations (pack, packDataSize) to be a little slower. | ||||||||||||||||||||||||||||||||||||||||
| Comment by Eliot Horowitz (Inactive) [ 04/Nov/11 ] | ||||||||||||||||||||||||||||||||||||||||
|
definitely not full compaction, just good with full correctness and very fast | ||||||||||||||||||||||||||||||||||||||||
| Comment by Aaron Staple [ 02/Nov/11 ] | ||||||||||||||||||||||||||||||||||||||||
|
Eliot, can you advise? | ||||||||||||||||||||||||||||||||||||||||
| Comment by Guanqun Lu [ 02/Nov/11 ] | ||||||||||||||||||||||||||||||||||||||||
|
A quick question: are we aiming for the complete de-duplication in a btree bucket or near-optimal solution? I'm asking this because with this unusual equality between keys and binary representation, in some cases, the best dedupe can't be done. e.g. 5, NumberLong(5), 5, NumberLong(5) etc... If we want to achieve the complete de-dupe, insertion operation could be expensive... | ||||||||||||||||||||||||||||||||||||||||
| Comment by Guanqun Lu [ 02/Nov/11 ] | ||||||||||||||||||||||||||||||||||||||||
|
Hi Aaron, Thanks for pointing out the equality of keys does not always mean the equality of binary representation. There are two ways (just out of my mind quickly) to fix this case:
I'll post here if there's any progress. | ||||||||||||||||||||||||||||||||||||||||
| Comment by Aaron Staple [ 02/Nov/11 ] | ||||||||||||||||||||||||||||||||||||||||
|
If this gets fixed I can take another look. | ||||||||||||||||||||||||||||||||||||||||
| Comment by Aaron Staple [ 02/Nov/11 ] | ||||||||||||||||||||||||||||||||||||||||
|
One todo here is that equality of two keys does not always mean they can share a binary representation. For example, numeric keys contain type information that will be lost if a binary representation is shared improperly: With the patched code:
The covered index query at the end returns an incorrect data type for the second document's 'a' field. | ||||||||||||||||||||||||||||||||||||||||
| Comment by Guanqun Lu [ 27/Oct/11 ] | ||||||||||||||||||||||||||||||||||||||||
|
Hi Dwight, Thanks for the encouragement. Regarding your questions: 1. Yes, when keys are the same, their offsets in __KeyNode point to the same location. In practice, it only checks with previous __KeyNode to see if the newly added key is the same with previous one, because we know the __KeyNode are sorted already during the index building process. But for the basicInsert case, we might need to check the next key as well. 2. We don't depend on the references to delete a key. When we delete a key, there are two cases: Hope the above answers your questions. I'm glad to provide more details about my implementation. | ||||||||||||||||||||||||||||||||||||||||
| Comment by Dwight Merriman [ 26/Oct/11 ] | ||||||||||||||||||||||||||||||||||||||||
|
haven't read in detail couple questions that would facilitate reading | ||||||||||||||||||||||||||||||||||||||||
| Comment by Dwight Merriman [ 26/Oct/11 ] | ||||||||||||||||||||||||||||||||||||||||
|
this is definitely something we want to do. | ||||||||||||||||||||||||||||||||||||||||
| Comment by Guanqun Lu [ 24/Oct/11 ] | ||||||||||||||||||||||||||||||||||||||||
|
I've written a rough (informal) post about how much we can gain from this de-dup: | ||||||||||||||||||||||||||||||||||||||||
| Comment by Guanqun Lu [ 20/Oct/11 ] | ||||||||||||||||||||||||||||||||||||||||
|
pull request here: https://github.com/mongodb/mongo/pull/130 please review. |