[SERVER-29219] Test and consider exposing different compression engines for MongoDB users Created: 15/May/17  Updated: 22/Nov/18  Resolved: 20/Nov/18

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Alexander Gorrod Assignee: Brian Lane
Resolution: Done Votes: 0
Labels: nonnyc, storage-engines
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-36352 enable zstd compression support in Mo... Closed
Participants:

 Description   

The WiredTiger storage engine supports several compression engines that are not exposed via MongoDB. It would be interesting to know whether there is value in exposing additional compression engines - it would also be very valuable to create a test that could be used to measure the relative compression rate vs CPU usage characteristics of different compression engines for a few interesting MongoDB workloads.

The particular compression engines that might be interesting are LZ4 and zstandard. A full set of compression libraries supported by the WiredTiger team is here:
https://github.com/wiredtiger/wiredtiger/tree/master/ext/compressors



 Comments   
Comment by Brian Lane [ 18/Sep/18 ]

alexander.gorrod I think your proposed set of test suites looks like a good place to start, and we can always expand on this depending on what the results look like.  Thanks.

Comment by Alexander Gorrod [ 11/Sep/18 ]

I've been thinking about which workloads would be suitable for making this decision. I think the data sets used in the blog post about compression shortly after WiredTiger integration are probably a good list of data sets to measure compression ratios. The other relevant metric is CPU overhead of compression - we've seen in the past that YCSB can be used as a measure of CPU efficiency in compression engines.

I'm going to suggest that the following suite of tests be used to decide if there is enough incremental benefit to a different compression scheme to warrant adding it as an option for MongoDB users:

Compression ratio tests:

Using mongoimport to load a dataset. These tests should be run using zlib, snappy, none, zstd and lz4 compression libraries.

Dataset Link
Enron email corpus http://www.cs.cmu.edu/~./enron/
Flight database https://www.transtats.bts.gov/OT_Delay/ot_delaycause1.asp?display=data&pn=1
TPC-H base data set http://www.tpc.org/information/results_spreadsheet.asp
Twitter data set MongoDB has an internal test set consisting of about 200k tweets

CPU/performance tests:

We should run the same set of compression libraries against YCSB phases: load, 100% read, 95% read, 5% update, 100% update, 50% read 50% update. With 5 million 1kb documents. Each workload executes for 20 million operations.

brian.lane@mongodb.com and asya Do you think the above set of results would deliver enough information to decide whether to support new compression libraries for MongoDB?

Comment by Asya Kamsky [ 19/May/17 ]

We are definitely interested in doing this. Next step will be to scope and schedule this.

Comment by Nick Judson [ 15/May/17 ]

Yes please. I've found LZ4 to be the best for my app.

Generated at Thu Feb 08 04:20:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.