[SERVER-28595] Understand the cause of slow incremental rebuild times Created: 03/Apr/17  Updated: 08/Dec/21  Resolved: 30/Jul/20

Status: Closed
Project: Core Server
Component/s: Build
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Justin Cohler Assignee: Andrew Morrow (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File profile_data.pyprof    
Issue Links:
Depends
is depended on by SERVER-14343 Should not take 10s of seconds to do ... Closed
Sprint: Platforms 2017-04-17, Dev Tools 2019-03-25, Dev Tools 2019-04-08, Dev Tools 2019-05-06, Dev Tools 2019-04-22, Dev Tools 2019-08-12, Dev Tools 2019-08-26, Dev Tools 2019-09-09, Dev Tools 2019-09-23, Dev Tools 2019-10-07, Dev Tools 2019-10-21
Participants:

 Comments   
Comment by bdbaddog#1 [ 25/May/17 ]

Notes (all are for null incremental builds)

  • Cached Configure is about 10% of null build runtime (approx 5 seconds)
  • Writing sconsign is about 20% of null build runtime (approx 10 seconds)
  • If we can safely enable caching changed() in FS.py that 5-10% runtime (approx 2-5 seconds)
  • Re regex called under get_content() this is due to logic to rebuild the command line to compare against previous build.
  • For 96k calls to subst, approximately 16k are for blank strings.. (typically via get_suffix and get_prefix)
  • Of all calls to subst for mongo build
  • 24520 have $ in them (457 unique strings)
  • 71517 are plain strings

Looks like many places where micro-optimizations may help.
(Changing list uniquifying code for example)

Comment by bdbaddog#1 [ 21/Apr/17 ]

A few notes on what I've investigated so far.
Tried commenting out sconsign writes.
Regular Null build: ~55s
With sconsign writing commented out: ~45s
Sadly pickle writes can't be done incrementally so in the case of a single file change unless sconsign is changed to some other method of persistance no improvement here.
I did try bumping to highest pickle version (From the current of 1), the difference (no my system) was not measurable (maybe +/- 1s)

Also of note. dblite does have a "dirty" flag to indicate when something has been changed in what it expects to write and to only write when it's dirty. So even with a null build, it's still getting set and writing.

I did see a few benchmarks which noted that using JSON for persistance may be an order of magnitude faster. May be worth investigating.

Comment by Andrew Morrow (Inactive) [ 13/Apr/17 ]

We have cherry-picked that change in: https://github.com/mongodb/mongo/commit/59681ee6603fc43f0f3209ec0f9c6a09476edfcc

Comment by bdbaddog#1 [ 13/Apr/17 ]

Looks like you have scons-2.5.0 in your third_party sources.
Any reason you haven't upgraded to 2.5.1?
There is one performance update in 2.5.1.
From Alexey Klimkin:

  • Use memoization to optimize PATH evaluation across all dependencies per
    node. (PR #345)
Comment by Andrew Morrow (Inactive) [ 05/Apr/17 ]

bdbaddog -

I've attached profiling data to this ticket, you should be able to view it with the above commands. Here are some observations:

  • About 11% of the time spent in _build_targets was spent writing back the sconsign file, even though nothing changed. Ideally, only objects which actually had changes should be re-written.
  • We spend about half of the time in changed, over the course of about 20k calls. Many of these calls appear to be redundant. In particular, make_ready_current causes two calls to changed - one direct, and the other indirect, through release_target_info. It seems likely that either release_target_info does not need to call changed, or that it could at least re-use the knowledge that make_ready_current gleaned from its call to changed. Also notable is that scan also calls into is_up_to_date. This appears to be another potential avenue for re-use, since the scan is reached from _find_next_ready_node, and make_ready_current is called from make_ready, but both _find_next_ready_node and make_ready are direct children of next_task
  • The 20k calls to to changed are rather pernicious, as they result in 4.5 million calls to the decider subsystem via decide_source. When that call chain is followed, only 9k calls resulted in a time spent talking to the filesystem via stat, since the result is memoized. However, it appears that the memoization is performed at too deep a layer in the object structure. If the memoization was moved higher, a lot of python execution time could be removed.
  • About half of the calls to changed spend significant time in get_contents. In this case, we aren't actually getting the file contents, since we are using the MD5-Timestamp decider. Instead, we are just doing substitution. It is not clear why this is so expensive, though a lot of it seems to be time spent in regex. Perhaps a faster regex library than re is in order? Or perhaps there is something we can do to reduce the substitution load we are asking of SCons?
  • Moving back up to build_targets, we spend about 14% of our time in get_stored_implicit, because we are building with --implicit-cache. The majority of that time is spent in str_to_node. Much like with the amplification in the calls to changed, get_stored_implicit makes 4k calls to prepare_dependencies, which causes 1.5 million calls to str_to_node. Improving both the runtime of str_to_node itself, and if possible re-using state to avoid redundant calls to str_to_node seems like a potential avenue for improvement.
  • Also of interest is to note that 12k calls to find_next_ready_node causes somewhat more than 3 million calls to iter_, for almost 14% of the total time in _build_targets. It is not clear to me what part of _find_next_ready_node is doing this or whether it can be improved, but it seems worth looking into.
Comment by Andrew Morrow (Inactive) [ 05/Apr/17 ]

I'm attaching profile data for a no-op SCons rebuild, run as:

python -m cProfile -o profile_data.pyprof buildscripts/scons.py --implicit-cache --dbg=on --variables-files=etc/scons/xcode_macosx.vars --debug=time,count,memoizer all -j8

You can view this data with

pyprof2calltree -i ./profile_data.pyprof -k

Generated at Thu Feb 08 04:18:33 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.