[SERVER-17741] LZ4 compressor for mongod Created: 25/Mar/15  Updated: 09/Jan/18  Resolved: 03/Aug/15

Status: Closed
Project: Core Server
Component/s: Storage, WiredTiger
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Minor - P4
Reporter: Quentin Conner Assignee: David Hows
Resolution: Won't Fix Votes: 2
Labels: 32qa
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File lz4-rc11-20150305.patch    
Issue Links:
Depends
depends on WT-1751 Add LZ4 compression to WiredTiger sup... Closed
Related
is related to SERVER-32595 LZ4 compression in MongoDB Closed
Participants:

 Description   

An LZ4 compressor PR was accepted by wiredtiger/wiredtiger and is coming to mongodb/mongo sometime soon.

Patches to mongod and the scons build are needed to integrate this wiredTiger storage engine compression engine.

A patch against 3.0.0-rc11 is attached to this ticket, but patches both the vendored wiredTiger library as well as mongod and the scons build. The src/third_party/wiredtiger portions will not be needed once a newer wiredTiger library is imported.

Ping me when/if ready to adopt this feature and I can produce a new patch just for the scons build and mongod.



 Comments   
Comment by Oleg Rekutin [ 09/Feb/17 ]

CPU is improved for comparable compression ratios. I'm interested in reduced CPU. This tends to come into play when nodes are catching up w/ oplog after being down (or after backup restore or a copy-data-style init).

Comment by Michael Cahill (Inactive) [ 03/Aug/15 ]

The performance results weren't compelling over snappy in our testing. We can revisit later if we see workloads where snappy is the bottleneck.

Comment by David Hows [ 09/Jul/15 ]

Ran a workload as described above. This shows that the LZ4 compressor seems to underperform considerably compared to snappy - generating only 1/2 the throughput.

Results were:

lz4.1.log:[OVERALL], RunTime(ms), 424519.0
lz4.1.log:[OVERALL], Throughput(ops/sec), 23556.071695259812
lz4.2.log:[OVERALL], RunTime(ms), 409365.0
lz4.2.log:[OVERALL], Throughput(ops/sec), 24428.077632430715
lz4.3.log:[OVERALL], RunTime(ms), 410653.0
lz4.3.log:[OVERALL], Throughput(ops/sec), 24351.45974825461
snappy.1.log:[OVERALL], RunTime(ms), 227168.0
snappy.1.log:[OVERALL], Throughput(ops/sec), 44020.284547119314
snappy.2.log:[OVERALL], RunTime(ms), 215991.0
snappy.2.log:[OVERALL], Throughput(ops/sec), 46298.22538902084

The YCSB workload file was as follows (generates 102GB)

recordcount=10000000
operationcount=10000000
fieldcount=100
fieldlength=100
workload=com.yahoo.ycsb.workloads.CoreWorkload
readallfields=true
readproportion=1
scanproportion=0
insertproportion=0
updateproportion=0
requestdistribution=uniform
threadcount=12
exportmeasurementsinterval=30000
insertretrycount=10
ignoreinserterrors=true
readretrycount=1
timeseries.granularity=100
reconnectionthroughput=10
reconnectiontime=1000

Comment by David Hows [ 06/Jul/15 ]

Ran some testing with LZ4 r127, snappy and zlib in MongoDB to compare times to insert and re-read data.

compressor % +time difference % shrink coll object size % difference in fsync time % difference in load time % difference in query time
LZ4 52.8 54.4 42.4 52.8 13.4
Snappy 72.5 50.5 51.1 72.5 29.3
ZLIB 701.3 72.3 81.0 700.5 135.4

david.hows, I edited this table to be legible. I'm assuming that the numbers are compared to no compression?

I'm very interested in the perf of lz4, but ycsb by default uses random binary data, so not positive it's the best tester for this. In my (limited) tests, I've seen YCSB data compress only about 90% using snappy. I have seen about 54% with zlib.

Generated at Thu Feb 08 03:45:26 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.