Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-24689

Support automatic compile bypass for non-source code changes in Evergreen patch builds

    • Fully Compatible
    • TIG 2018-1-15, TIG 2018-1-1, TIG 2017-12-18, TIG 2018-1-29, TIG 2018-02-12, TIG 2018-02-26

      TL;DR - Patch builds can now automatically bypass compile step for non-source code changes, saving about 20-30 mins of patch build time. This would be used to modify evergreen.yml in a consistent manner when the compile step can be skipped for engineers working on test infrastructure and evergreen.yml changes.

      For patch builds, we implemented a method to automatically determine whether we can bypass the compilation step given the modified files in the patch. This feature has been requested for some time in order to save engineers time getting results back from patch builds. The idea is that if the files modified in the patch are deemed to not require a re-compilation of source files, then we should be able to retrieve pre-existing binaries (say from the base commit) and use those for any other tasks in the build variant. Saving the compile step time will reduce the time of patch builds by about 20-30 mins.

      We settled on an approach to use a whitelist of files and directories and exception lists to determine whether we should skip the compile step. This approach is a conservative method where we decide ahead of time which files and directories are considered to not require a compile. We include exception lists to explicitly call out files and directories that may overlap the whitelists or otherwise require the compile step. In essence, if all modified patch files belong to the whitelists and are not explicitly excepted, then we bypass the compile step. All other file changes will induce compilation.

      In evergreen.yml, we added 3 new functions to accomplish this feature.
      1. "get modified patch files"
      2. "bypass compile and fetch binaries" and
      3. "update bypass expansions"

      The "get modified patch files" function simply does a "git diff HEAD --name-only" to find all the files that were modified in your patch build and stores them in a file called patch_files.txt. We check these files against the whitelist and exception lists to determine whether we bypass compile.

      The "bypass compile and fetch binaries" function takes the patch_files.txt file and calls out to a new Python script called buildscripts/bypass_compile_and_fetch_binaries.py that contains all the logic to determine whether to bypass the compile step and potentially generates 2 files used later in evergreen.yml. If this function determines that compilation is required, then no files are generated; the default setting will cause the compile step to run. In the case where compilation is bypassed, one of the files generated is bypass_compile_expansions.yml which contains the expansion macros used in evergreen.yml to bypass compile. The other file is artifacts.json which contains the URL links to the base commit artifacts (binaries, mongo-shell and mongo-debugsymbols) used in place of artifacts that would normally be generated during the compilation step. Of note, only certain files of the artifacts.tgz 'artifact' of the base commit are extracted; all others come from the patch build in order to preserve any modified patch files.

      The "update bypass expansions" function simply applies the expansions in bypass_compile_expansions.yml. The main macro is bypass_compile which is the knob that if enabled, bypasses the compile step in evergreen.yml.

      A caveat:
      If you use a very recent commit as your base commit in your patch build, the artifacts may not be available yet and hence the bypass compile may be aborted, leading to compilation as usual.

      Other notes:
      You can disable this compile bypass feature by commenting out the one line in the compile task that calls the function "bypass compile and fetch binaries".

      We also tried the evergreen fetch CLI command to download the base commit artifacts in one step. This afforded us parallel downloads, but ultimately we decided to go with an implementation using the new Evergreen command attach.artifacts instead.

      The new Evergreen command attach.artifacts gives us access to the task/<taskID>/files endpoint and conveniently allows us to associate pre-existing URLs to a particular task. This saves us the duplication of storage and the cost of downloading these artifacts.

            Assignee:
            eddie.louie Eddie Louie
            Reporter:
            michael.grundy Michael Grundy
            Votes:
            1 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: