Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-14101

Reduce PR test time for python test suite

    • Type: Icon: Improvement Improvement
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Storage Engines
    • StorEng - Defined Pipeline

      Some work we did in SLS-1041 may give a interesting idea for further parallelizing WiredTiger python test runs.

      Currently we have over 600 test files, and each may have multiple tests along with some scenarios, so that over 14000 individual tests are run.  Our current strategy for parallelism is to run 12 evergreen jobs, each look at the same list of tests, number them, and then job X takes all tests where test-number mod 12 == X.  Each of the resulting test for job X is then executed in sequence.  The run.py option that is used is -b, for "batching".  This works, but does not exploit the parallelism available in each machine.  The trouble is that: python threads don't parallellize do well, especially when they often call into C a lot.  (google "python GIL").  We also python library (concurrencytools) that creates multiple python processes, but that has had issues in the past.

      In SLS-1041, we are using a new shell script tools/pytest_parallel to manage the parallel processes.  There we do it on a test file basis (since we're running a subset of the test files in the suite).  For this ticket, we'd probably want to run it on an individual test basis (since we'll still use batching, and we'll have a list of tests rather than files).  For a first try, we could run an individual python instance on one test in a single process.  And we could run N processes (probably depending on number of cores in the machine).  We could probably get faster by having each python instance run a handful of tests, so we'd be starting up Python, loading the wiredtiger library, etc. just once per handful.

      It probably takes some small changes to run.py and evergreen.yml to try out this concept.  Getting it working on Windows might present the larger challenge. It may be easiest to do this ticket once the SLS work is merged into WT develop.

      BTW, the time it takes to run the python test suite bucket tasks in evergreen are consistently the longest, so even a moderate speedup (4x on an 8 core machine?) could really speed up build times.

            Assignee:
            backlog-server-storage-engines [DO NOT USE] Backlog - Storage Engines Team
            Reporter:
            donald.anderson@mongodb.com Donald Anderson
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: