[SERVER-14051] Add a 'ditto' or repeating group option to indexes. Created: 26/May/14 Updated: 20/Jan/15 Resolved: 20/Jan/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Index Maintenance |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | John Page | Assignee: | Ian Whalen (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Participants: |
| Description |
|
Whenever you access an index node you do it from a parent or sibling. If the value for this index node is the same [as it's sibling or parent] then storing the value, especially for full text indexes a value is redundant and a huge waste of space - we could store a single byte that means 'the same as the previous one' and dramatically reduce index sizes with all the performance benefits that brings. It would also be good to have the initial node - which has the value contain a pointer to the next 'different' node if we don't do that already - all this is key in competing with a column store on counting and summarising ad-hoc. I'm sure there is a JIRA for counting indexes that plays into this too however this is about performance and index compression. This is essentially a way to skip through the index the way we currently can skip a sub object when parsing BSON. |
| Comments |
| Comment by John Page [ 14/Aug/14 ] |
|
Sorry thought you asked me to rewrite again Ramon. |
| Comment by John Page [ 14/Aug/14 ] |
|
Thought I did, I'll do it simpler. Stop saving the same key multiple times in the index. This is really inefficient. Whenever you access a duplicate key you just read a sibling with the same value. So where a key is the same as the one before, save a special, small byte sequence , to say so. |