[SERVER-82551] Use parallel compressor to speedup binaries archival Created: 29/Oct/23  Updated: 12/Nov/23  Resolved: 02/Nov/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.2.0-rc0

Type: Improvement Priority: Major - P3
Reporter: Tommaso Tocci Assignee: Tommaso Tocci
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File use_pigz_compressor.patch    
Issue Links:
Related
Assigned Teams:
Server Development Platform
Backwards Compatibility: Fully Compatible
Sprint: Dev Tools 2020-04-06
Participants:

 Description   

Summary

Using pigz parallel compressor to create binary tarball would reduce archive_dist_test_debug task runtime from ~14 min to ~5 min

Long description

The majority of evergreen tasks will run only after mongo binaries have been compiled, compressed and uploaded to S3.

For amazon linux 2 variant these steps will take roughly:

In archive_dist_test_debug is mostly composed by two parts:

The compression is performed using the following tar command:

/bin/tar -C build/install -T /data/mci/5098d994527fa548b1195cf0b5831e45/src/mongo-debugsymbols.tgz.filelist -czf mongo-debugsymbols.tgz

tar by default use single thread compression algorithm, this means that we are using only 1 out of the 16 cores available (we currently use amazon2-arm64-large for this task).

It is possible to simply tell tar command to use the parallel compessor pigz to make use of all the available core.

A quick experiment showed how using pigz will reduce the tar command execution time from 9.22 min to 35 seconds.



 Comments   
Comment by Alex Neben [ 12/Nov/23 ]

From my quick playing around with things this was the best query I could come up with. I just checked today and it doesn't seem to be making a big dent so my feeling is that we do not need to backport this.

Comment by Tommaso Tocci [ 06/Nov/23 ]

alex.neben@mongodb.com is your query monitoring archive_dist_test_debug task specifically ? I think we won't see much improvements in the other `scons compile` steps if there are no binaries to archive.

Comment by Alex Neben [ 05/Nov/23 ]

I made this query that does a (hopefully) good job at teasing out your change to see if it has a major impact overall. If the answer is yes then
1. We can calculate how much $$$ we saved by this change

2. We can decide on a backport based on these numbers.

Either way this is an awesome change! Great work!

Comment by Githook User [ 02/Nov/23 ]

Author:

{'name': 'Tommaso Tocci', 'email': 'tommaso.tocci@mongodb.com', 'username': 'toto-dev'}

Message: SERVER-82551 Use parallel compressor to speedup binaries archival
Branch: master
https://github.com/mongodb/mongo/commit/85da569702ecb99412a087a11d6ecb3e8501df0f

Comment by Alex Neben [ 31/Oct/23 ]

I think whoever is on triage rotation this (or next) week should try to take this on. This is a huge win! tommaso.tocci@mongodb.com if you are already working on it I don't want to steal it from you so just communicate on this ticket if you want to take it over the finish line or not.

Comment by Trevor Guidry [ 30/Oct/23 ]

tommaso.tocci@mongodb.com Alright, I will assume the python package is just unrelated and broken. Thanks for the answer.

Comment by Tommaso Tocci [ 30/Oct/23 ]

trevor.guidry@mongodb.com pigz v2.3 was released 10 years ago 10 years ago and nowadays is shipped as standard package in many linux distros. I don't know for sure if pigz works correctly but I would assume so

Comment by Trevor Guidry [ 30/Oct/23 ]

Is pigz reliable? I tried using a pigz python library before https://github.com/bguise987/pigz-python and some of the files that were compressed by it were corrupted. It is possible the python library is implemented incorrectly and the pigz binary is fine. 

Generated at Thu Feb 08 06:49:37 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.