[SERVER-43977] Fully Indexed Collection Created: 12/Oct/19  Updated: 27/Oct/23  Resolved: 18/Nov/19

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Alex Leong Assignee: Dmitry Agranat
Resolution: Community Answered Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Participants:

 Description   

We are creation a large collection with 2 fields and both are fully indexed.

 

For example :

db.collection.insert ( { x : "sddsqq" })

db.collection.createIndex ( { x : 1, _id : 1 })

 

Obviously the index is required for fast random access.  Now the only problem is that the index could end up being larger than the document in terms of storage.  More over 2x the storage is required to store the same 'data' (doc and index entries).

 

Does Mongo have any plans to make improvement similar to IOT tables in Oracle ?

 

Thanks

 

Alex Leong

 



 Comments   
Comment by Dmitry Agranat [ 03/Nov/19 ]

Hi Alex,

We do consider a project of clustered indexes in the future. However, currently we do not have enough information to provide as to if or when this would be implemented.

Thanks,
Dima

Comment by Alex Leong [ 16/Oct/19 ]

Hi Dima

Whether it is zlib or even zstd, it is double the amount of storage unnecessary.

Plus the Mongo v4.2 feature allows indexing using wildcard, potentially all fields in a collection.

 

There are certainly use cases for such, for example when someone using a collection as a very large dictionary/map/lookup kind of data structure.

 

Thanks

Alex

Comment by Dmitry Agranat [ 16/Oct/19 ]

Hi Alex,

In order for us to consider a feature request, we would like to understand the use case and data model first. This is important to determine what current alternatives we already have. Could you provide the requested information, as detailed as possible, from my last comment?

Since storage size is your main concern, have you considered using zlib compression?

Thanks,
Dima

Comment by Alex Leong [ 15/Oct/19 ]

Hi Dima

Thank you for you response.

Yes, our concern would be the storage size, plus twice the storage needed.

Sorry, there is nothing much I could tell you about its designed, it was created by one of our Developers. The document looks like this :

{ "hashedEmail" : NumberLong("58504134050438472"), "messageId" : "<0.19kt94col1jgCXXXCXq@indeedemail.com>" } { "hashedEmail" : NumberLong("1835081692994461526"), "messageId" : "<0.1a3a41gdXCXXCb@indeedemail.com>" }

Thanks

Alex

 

 

Comment by Dmitry Agranat [ 15/Oct/19 ]

Hi aleong@indeed.com,

Just to make sure I understand what you are requesting, please let me know if this is correct:

  • Issue: index could end up being larger than the document in terms of storage
  • Feature request: Introduce Index Organized Tables (IOT) which have their primary key data and non-key column data stored within the same B*Tree structure.

As for the IOT feature request, could you please describe your use case, data model as well as provide us with a few sample documents from a collection? Also, is the only concern is the total storage size?

I am asking these questions because if the main issue is the total storage size, and if this is because of the "Issue: index could end up being larger than the document in terms of storage" which most probably is caused by fragmentation over time, as with B*Tree index, IOT can also become fragmented and may consume more total storage space.

Thanks,
Dima

Generated at Thu Feb 08 05:04:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.