[SERVER-43977] Fully Indexed Collection Created: 12/Oct/19 Updated: 27/Oct/23 Resolved: 18/Nov/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Alex Leong | Assignee: | Dmitry Agranat |
| Resolution: | Community Answered | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Backwards Compatibility: | Fully Compatible |
| Participants: |
| Description |
|
We are creation a large collection with 2 fields and both are fully indexed.
For example : db.collection.insert ( { x : "sddsqq" }) db.collection.createIndex ( { x : 1, _id : 1 })
Obviously the index is required for fast random access. Now the only problem is that the index could end up being larger than the document in terms of storage. More over 2x the storage is required to store the same 'data' (doc and index entries).
Does Mongo have any plans to make improvement similar to IOT tables in Oracle ?
Thanks
Alex Leong
|
| Comments |
| Comment by Dmitry Agranat [ 03/Nov/19 ] |
|
Hi Alex, We do consider a project of clustered indexes in the future. However, currently we do not have enough information to provide as to if or when this would be implemented. Thanks, |
| Comment by Alex Leong [ 16/Oct/19 ] |
|
Hi Dima Whether it is zlib or even zstd, it is double the amount of storage unnecessary. Plus the Mongo v4.2 feature allows indexing using wildcard, potentially all fields in a collection.
There are certainly use cases for such, for example when someone using a collection as a very large dictionary/map/lookup kind of data structure.
Thanks Alex |
| Comment by Dmitry Agranat [ 16/Oct/19 ] |
|
Hi Alex, In order for us to consider a feature request, we would like to understand the use case and data model first. This is important to determine what current alternatives we already have. Could you provide the requested information, as detailed as possible, from my last comment? Since storage size is your main concern, have you considered using zlib compression? Thanks, |
| Comment by Alex Leong [ 15/Oct/19 ] |
|
Hi Dima Thank you for you response. Yes, our concern would be the storage size, plus twice the storage needed. Sorry, there is nothing much I could tell you about its designed, it was created by one of our Developers. The document looks like this : { "hashedEmail" : NumberLong("58504134050438472"), "messageId" : "<0.19kt94col1jgCXXXCXq@indeedemail.com>" } { "hashedEmail" : NumberLong("1835081692994461526"), "messageId" : "<0.1a3a41gdXCXXCb@indeedemail.com>" }Thanks Alex
|
| Comment by Dmitry Agranat [ 15/Oct/19 ] |
|
Just to make sure I understand what you are requesting, please let me know if this is correct:
As for the IOT feature request, could you please describe your use case, data model as well as provide us with a few sample documents from a collection? Also, is the only concern is the total storage size? I am asking these questions because if the main issue is the total storage size, and if this is because of the "Issue: index could end up being larger than the document in terms of storage" which most probably is caused by fragmentation over time, as with B*Tree index, IOT can also become fragmented and may consume more total storage space. Thanks, |