[SERVER-39467] Regression in MD5-Timestamp decider Created: 08/Feb/19  Updated: 29/Oct/23  Resolved: 19/Aug/19

Status: Closed
Project: Core Server
Component/s: Build
Affects Version/s: None
Fix Version/s: 4.3.1

Type: New Feature Priority: Major - P3
Reporter: Andrew Morrow (Inactive) Assignee: Mathew Robinson (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-39999 Permit auto-enablement of --build-fas... Closed
Problem/Incident
is caused by SERVER-21336 Should be possible to use CacheDir an... Closed
Backwards Compatibility: Fully Compatible
Sprint: Dev Tools 2019-03-25, Dev Tools 2019-04-08, Dev Tools 2019-05-06, Dev Tools 2019-04-22, Dev Tools 2019-07-01, Dev Tools 2019-07-15, Dev Tools 2019-07-29, Dev Tools 2019-08-12, Dev Tools 2019-08-26
Participants:

 Description   

The MD5-Timestamp decider has gotten slower in SCons 3.0.4 (see https://jira.mongodb.org/browse/SERVER-21336?focusedCommentId=2127333&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-2127333)

We should restore it to its prior performance.



 Comments   
Comment by Githook User [ 21/Aug/20 ]

Author:

{'name': 'Mathew Robinson', 'email': 'mathew.robinson@mongodb.com'}

Message: SERVER-50403 SERVER-39467 Update EnsureSConsVersion to 3.1.1

(cherry picked from commit d362622b92319611dc247a94a6d125571c4b9b57)
Branch: v4.2
https://github.com/mongodb/mongo/commit/6fe58c3c890e5af151044ef86019a303ef2443ed

Comment by Githook User [ 21/Aug/20 ]

Author:

{'name': 'Mathew Robinson', 'email': 'chasinglogic@gmail.com'}

Message: SERVER-50403 SERVER-39467 Upgrade vendored SCons to 3.1.1

(cherry picked from commit 364b08d9d8348a9bf93cbff8eff7181da1a8f336)
Branch: v4.2
https://github.com/mongodb/mongo/commit/d02941c18774c64442f6ac18b8b2a51abc53ad50

Comment by Githook User [ 27/Aug/19 ]

Author:

{'name': 'Mathew Robinson', 'email': 'mathew.robinson@mongodb.com', 'username': 'chasinglogic'}

Message: SERVER-39467 Update EnsurePythonVersion to 3.6
Branch: master
https://github.com/mongodb/mongo/commit/60d8961a7db72f5e4a31c8017c4b7442f8e04c54

Comment by Githook User [ 26/Aug/19 ]

Author:

{'name': 'Mathew Robinson', 'email': 'mathew.robinson@mongodb.com', 'username': 'chasinglogic'}

Message: SERVER-39467 Update EnsureSConsVersion to 3.1.1
Branch: master
https://github.com/mongodb/mongo/commit/d362622b92319611dc247a94a6d125571c4b9b57

Comment by Githook User [ 16/Aug/19 ]

Author:

{'name': 'Mathew Robinson', 'email': 'chasinglogic@gmail.com', 'username': 'chasinglogic'}

Message: SERVER-39467 Upgrade vendored SCons to 3.1.1
Branch: master
https://github.com/mongodb/mongo/commit/364b08d9d8348a9bf93cbff8eff7181da1a8f336

Comment by Andrew Morrow (Inactive) [ 12/Aug/19 ]

bdbaddog - Reassigning this to Matt so he can pull in the SCons 3.1.1 update, and then I think we can resolve this ticket.

Comment by Andrew Morrow (Inactive) [ 27/Jun/19 ]

Tested with the following script. The merge base for tests on master was 518f382a78f904b2e7adf8ac36acc3a34a82fb4f and the experiment was at 59404d29e767a3bac0a9c23558e2d136d1caf91a (bd/fix_slow_scons_md5_decider_no_exceptions)

#/bin/bash
 
set -o verbose
set -o errexit
 
_Prelude="/usr/bin/time --format %eelapsed /opt/mongodbtoolchain/v3/bin/python3 ./buildscripts/scons.py"
_CommonArgs="--cache-dir=$(pwd)/cache VERBOSE=0 -j12 --implicit-cache --disable-warnings-as-errors --install-mode=hygienic --link-model=dynamic --allocator=system --js-engine=none --modules= --variables-files=etc/scons/mongodbtoolchain_stable_gcc.vars --dbg=off --opt=size install-embedded-dev install-embedded-test"
 
# no scons patch, no cache, bfl=off
git checkout $(git merge-base bd/fix_slow_scons_md5_decider_no_exceptions master)
git clean -xfd
mkdir cache
$_Prelude --build-fast-and-loose=off $_CommonArgs 2>&1 | tee ../baseline_nocache_nobfl_build.log
$_Prelude --build-fast-and-loose=off $_CommonArgs 2>&1 | tee ../baseline_nocache_nobfl_rebuild.log
 
 
 
# no scons patch, no cache, bfl=on
git checkout $(git merge-base bd/fix_slow_scons_md5_decider_no_exceptions master)
git clean -xfd
mkdir cache
$_Prelude --build-fast-and-loose=on $_CommonArgs 2>&1 | tee ../baseline_nocache_bfl_build.log
$_Prelude --build-fast-and-loose=on $_CommonArgs 2>&1 | tee ../baseline_nocache_bfl_rebuild.log
 
 
 
# no scons patch, cache, bfl=off
git checkout $(git merge-base bd/fix_slow_scons_md5_decider_no_exceptions master)
git clean -xfd
mkdir cache
$_Prelude --build-fast-and-loose=off --cache $_CommonArgs 2>&1 | tee ../baseline_cache_nobfl_build.log
$_Prelude --build-fast-and-loose=off --cache $_CommonArgs 2>&1 | tee ../baseline_cache_nobfl_rebuild.log
\rm -rf build
$_Prelude --build-fast-and-loose=off --cache $_CommonArgs 2>&1 | tee ../baseline_cache_nobfl_cacherebuild.log
 
 
 
# no scons patch, cache, bfl=on
git checkout $(git merge-base bd/fix_slow_scons_md5_decider_no_exceptions master)
git clean -xfd
mkdir cache
$_Prelude --build-fast-and-loose=on --cache $_CommonArgs 2>&1 | tee ../baseline_cache_bfl_build.log
$_Prelude --build-fast-and-loose=on --cache $_CommonArgs 2>&1 | tee ../baseline_cache_bfl_rebuild.log
\rm -rf build
$_Prelude --build-fast-and-loose=on --cache $_CommonArgs 2>&1 | tee ../baseline_cache_bfl_cacherebuild.log
 
 
 
# scons patch, no cache, bfl=off
git checkout bd/fix_slow_scons_md5_decider_no_exceptions
git clean -xfd
mkdir cache
$_Prelude --build-fast-and-loose=off $_CommonArgs 2>&1 | tee ../experiment_nocache_nobfl_build.log
$_Prelude --build-fast-and-loose=off $_CommonArgs 2>&1 | tee ../experiment_nocache_nobfl_rebuild.log
 
 
 
# scons patch, no cache, bfl=on
git checkout bd/fix_slow_scons_md5_decider_no_exceptions
git clean -xfd
mkdir cache
$_Prelude --build-fast-and-loose=on $_CommonArgs 2>&1 | tee ../experiment_nocache_bfl_build.log
$_Prelude --build-fast-and-loose=on $_CommonArgs 2>&1 | tee ../experiment_nocache_bfl_rebuild.log
 
 
 
# scons patch, cache, bfl=off
git checkout bd/fix_slow_scons_md5_decider_no_exceptions
git clean -xfd
mkdir cache
$_Prelude --build-fast-and-loose=off --cache $_CommonArgs 2>&1 | tee ../experiment_cache_nobfl_build.log
$_Prelude --build-fast-and-loose=off --cache $_CommonArgs 2>&1 | tee ../experiment_cache_nobfl_rebuild.log
\rm -rf build
$_Prelude --build-fast-and-loose=off --cache $_CommonArgs 2>&1 | tee ../experiment_cache_nobfl_cacherebuild.log
 
 
 
# scons patch, cache, bfl=on
git checkout bd/fix_slow_scons_md5_decider_no_exceptions
git clean -xfd
mkdir cache
$_Prelude --build-fast-and-loose=on --cache $_CommonArgs 2>&1 | tee ../experiment_cache_bfl_build.log
$_Prelude --build-fast-and-loose=on --cache $_CommonArgs 2>&1 | tee ../experiment_cache_bfl_rebuild.log
\rm -rf build
$_Prelude --build-fast-and-loose=on --cache $_CommonArgs 2>&1 | tee ../experiment_cache_bfl_cacherebuild.log

Results:

$ ls -1rt *_rebuild.log | grep nocache | xargs tail -n 5
==> baseline_nocache_nobfl_rebuild.log <==
scons: Building targets ...
scons: `install-embedded-dev' is up to date.
scons: `install-embedded-test' is up to date.
scons: done building targets.
72.55elapsed
 
==> baseline_nocache_bfl_rebuild.log <==
scons: Building targets ...
scons: `install-embedded-dev' is up to date.
scons: `install-embedded-test' is up to date.
scons: done building targets.
71.30elapsed
 
==> experiment_nocache_nobfl_rebuild.log <==
scons: Building targets ...
scons: `install-embedded-dev' is up to date.
scons: `install-embedded-test' is up to date.
scons: done building targets.
67.11elapsed
 
==> experiment_nocache_bfl_rebuild.log <==
scons: Building targets ...
scons: `install-embedded-dev' is up to date.
scons: `install-embedded-test' is up to date.
scons: done building targets.
55.35elapsed

Conclusions:

  • On baseline, BFL is a wash with no-BFL for this workload.
  • But on the branch, it is about an 18% win.
  • Additionally, even in the non-BFL case we observe about about a 5% improvement as well. - Overall, this means that going from not using BFL on master to using BFL on the branch is about a 24% win. This looks very promising!
Comment by Andrew Morrow (Inactive) [ 26/Jun/19 ]

Baseline here is:

$ git merge-base HEAD master
518f382a78f904b2e7adf8ac36acc3a34a82fb4f

Experiment here is:

commit 59404d29e767a3bac0a9c23558e2d136d1caf91a (HEAD, bd/fix_slow_scons_md5_decider_no_exceptions)
Author: William Deegan <bill@baddogconsulting.com>
Date:   Wed Jun 26 14:13:23 2019 -0700
 
    Propagate more changes related to adding repo_node fourth argument to all deciders which where discovered running scons full regression tests. In particular the change to SConf.py applies when using --cache=force which is in somewhat common usage by developers

Builds were done as:

time python3 buildscripts/scons.py --link-model=dynamic --implicit-cache --build-fast-and-loose=on --dbg=on --opt=on --variables-files="/home/andrew/.scons/site_scons/mongo_custom_variables.py etc/scons/mongodbtoolchain_stable_gcc.vars" all -j300 --allocator=system --debug=time

The numbers here are only for rebuilds.

 

Checkout no-BFL BFL Latency
baseline 165.99 135.89 -18%
experiment 145.84 105.35 -27%

Interesting things to note:

  • We were faster with the baseline in BFL mode too. We thought we had a regression?
  • But the gains in the experiment are really worthwhile.

I will rerun the exact tests from SERVER-21336 overnight so we get an apples to apples comparison, but this quick tests suggests good results.

Comment by Andrew Morrow (Inactive) [ 10/May/19 ]

Definitely a step in the right direction. How much harder would the lazy-stringification approach be? It seems definitely a worthwhile thing to try.

Comment by bdbaddog#1 [ 10/May/19 ]

Initial pass at speedup.

Add Node  objects as  well as strings of such to dependency map dictionary. First check if node in map, then stringify and check for string.

 

Yields:

 

  BFL BFL+node in dict no BFL
  48.20 44.73 36.28
  48.29 44.56 36.68
  48.19 45.04 36.61
Avg  48.22 44.78 36.52

 

Also interesting in mongo default build with this command line yields 7850 lookups needing stringifying, and 3076695 hits where comparing the Node object is sufficient.

/usr/bin/time -p python buildscripts/scons.py -j 10 --variables-files=./etc/scons/mongodbtoolchain_stable_gcc.vars --ssl --implicit-cache --disable-warnings-as-errors --modules= --build-fast-and-loose=on --link-model=dynamic

 
Looks like only adding strings to dependency map when node lookup fails could be significantly faster.

 

Generated at Thu Feb 08 04:52:09 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.