[SERVER-27261] All secondary indexes are compressed but not primary key(_id) Created: 02/Dec/16  Updated: 08/Feb/23  Resolved: 03/Jan/17

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.2.10
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: 아나 하리 Assignee: Geert Bosch
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Steps To Reproduce:

1) start mongodb with below WiredTiger options
{{ wiredTiger:
engineConfig:
cacheSizeGB: 1
journalCompressor: none
directoryForIndexes: false
collectionConfig:
blockCompressor: none
indexConfig:
prefixCompression: false}}
2) Create collection with a few secondary indexes
3) Insert a few documents which contains visible character string (like 'MATT')
4) hexdump data file and each index files

Sprint: Storage 2017-01-23
Participants:

 Description   

I've heard that index compression is not so usefult, so index block is not compressed.
And in the manual, they don't mention about compression of index block.
So I've thoght that index block is not compressed by default.
There's no index compression options in Wiredtiger engine (I know we can compress index block by specifying special configString option).

But actually this is not the case,
After creating collection with a few secondary indexes, I have examined data file with hexdump.
Primary key index is not compressed, but all secondary indexes are compressed.

Is this MongoDB expected or not ?
Why secondary indexes are compressed ? and primary index is not ?
And (if this is expected case) could you mention this in mongodb manual ?



 Comments   
Comment by Geert Bosch [ 03/Jan/17 ]

While MongoDB doesn't use use compression for indexes on WiredTiger, we don't store index keys verbatim. In particular, we use a KeyString format, rather than BSON, that will ensure that all keys are binary comparable. This also will do things like flipping all bits, depending on specified ordering, and much more comprehensive encoding for numeric types. For strings with non-simple locales, we use the ICU library to recode strings so they are binary comparable using the locale-specific rules and desired strength (case-insensitive or not, for example).

Finally, in some cases prefix and/or suffix compression may be applied: this really isn't "compression" like gzip or snappy, but just not storing the repeated common prefix for a list of keys. All these methods generally result in both a very significant speed improvement, as well as reduced storage and cache pressure.

In short, you cannot in general expect to be able to see your strings in literal form in your *.wt files, even with compression turned off.

Generated at Thu Feb 08 04:14:40 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.