[SERVER-26224] Resmoke gets much slower with more parallelism Created: 21/Sep/16  Updated: 06/Dec/17  Resolved: 27/Mar/17

Status: Closed
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: None
Fix Version/s: 3.5.6

Type: Bug Priority: Major - P3
Reporter: Geert Bosch Assignee: Eddie Louie
Resolution: Done Votes: 1
Labels: tig-resmoke
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
Related
is related to SERVER-24729 stagger the launching of resmoke jobs Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

% resmoke -j24 --suite=decimal

yields: All 28 test(s) passed in 200.13 seconds,
but

% resmoke -j1 --suite=decimal

yields: All 28 test(s) passed in 22.45 seconds.

Sprint: TIG 2017-04-17
Participants:

 Description   

With -j24 test suites now take at a minimum 200 seconds to execute, while some used to run in about 5 seconds. This slows down development and probably adds to CI costs.

The following fixes it:

diff --git a/buildscripts/resmokelib/testing/executor.py b/buildscripts/resmokelib/testing/executor.py
index 3628fa0..0ce5ef9 100644
--- a/buildscripts/resmokelib/testing/executor.py
+++ b/buildscripts/resmokelib/testing/executor.py
@@ -149,10 +149,6 @@ class TestGroupExecutor(object):
                 t.daemon = True
                 t.start()
                 threads.append(t)
-                # SERVER-24729 Need to stagger when jobs start to reduce I/O load if there
-                # are many of them.  Both the 5 and the 10 are arbitrary.
-                if len(threads) >= 5:
-                    time.sleep(10)
 
             joined = False
             while not joined:

With this,

% resmoke -j1 --suite=decimal

yields: All 28 test(s) passed in 5.41 seconds.
This is a 37 times speedup.



 Comments   
Comment by Githook User [ 27/Mar/17 ]

Author:

{u'username': u'elouie99', u'name': u'Eddie Louie', u'email': u'eddie.louie@mongodb.com'}

Message: SERVER-26224 Add --staggerJobs option to resmoke.py
Branch: master
https://github.com/mongodb/mongo/commit/bc5c3286fba8cdb40fa6b2c195712075e3a05a1f

Comment by Max Hirschhorn [ 21/Mar/17 ]

I'm hesitant to lessen or remove altogether the staggering of resmoke.py jobs until we better understand the system resource utilization of our test suites. What I'd like to do for this ticket for now is to add a new --stagger-jobs flag to resmoke.py that defaults to being off and only enable it in Evergreen. This way local testing won't be impacted by this issue.

Comment by Eric Milkie [ 21/Sep/16 ]

I think "10" in the sleep is too much. Perhaps "1" or "2" would be sufficient.

Generated at Thu Feb 08 04:11:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.