[SERVER-64925] Use block compression for secondary indexes on clustered collections Created: 25/Mar/22  Updated: 26/Oct/23

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Louis Williams Assignee: Backlog - Storage Execution Team
Resolution: Unresolved Votes: 0
Labels: clustered_collections, former-storex-namer
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-65976 Enable block compression for column i... Closed
Assigned Teams:
Storage Execution
Sprint: Execution Team 2022-10-17
Participants:

 Description   

We don't use WiredTiger block compression for secondary indexes on non-clustered collections because we already use prefix compression and the stored RecordIds are pretty compact.

Clustered collections have larger RecordIds which take up more space, but may compress better.

We should evaluate using block compression for secondary indexes on clustered collections to reduce storage size.



 Comments   
Comment by Connie Chen [ 11/Nov/22 ]

michael.gargiulo@mongodb.com - taking this out of the desired bucket and throwing it into the backlog. Let us know if we should reconsider

Comment by Louis Williams [ 04/Oct/22 ]

matthew.saltz@mongodb.com, the existing workload only creates one secondary index. We could potentially create a new workload with more indexes, but I would start by looking at that workload first. There is a version of the workload that uses larger RecordIds, which would be an interesting point of comparison between that and the one with smaller RecordIds.

Also, I Consider these other challenges

  • If the workload is entirely in cache, which is likely, then we won't be observing the cost of decompression, only the cost of compression. We may want to consider running with a smaller cache size for comparison. This would be easist done locally.
  • I'm not sure that Genny saves the data files, so you may need to run locally to check the size of the indexes anyways.

So no, I don't think we necessarily need a new workload. Some local testing could answer the questions that we're after.

Comment by Matthew Saltz (Inactive) [ 03/Oct/22 ]

louis.williams@mongodb.com For performance testing, do you think existing sys-perf workloads will be sufficient or will we need to write targeted tests for this?

Comment by Connie Chen [ 29/Mar/22 ]

We'll want to evaluate the performance tradeoffs before commit

Generated at Thu Feb 08 06:01:28 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.