Details
-
Improvement
-
Resolution: Unresolved
-
Minor - P4
-
None
-
None
-
Server Tooling & Methods
Description
For teams like sharding and replication, Javascript test reliability is often dependent on the hardware speed of the machines they run on, since there are multiple nodes that are in communication with each other. In the past, slow machines have exposed either flaky tests or actual bugs that wouldn't manifest themselves on a high performance machine. It seems that having a dedicated set of variants, with controllable levels of "slowness" could be a useful part of our test infrastructure. The slowness parameters could include disk, network, CPU, etc, with potentially separate variants for different types. This could expose our system and tests to varying types of stress that may or may not be explicitly tested currently. Tests which are dependent on timing and machine speed could likely be ignored by such a variant.
The main goals of these "slow" variants would be the following:
1. Expose Test Flakiness: Provide stronger and more explicit verification that tests aren't "flaky". That is, tests that shouldn't be dependent on machine speed should not fail due to a machine speed issue.
2. Expose Timing Dependent Server Bugs: Provide a more efficient and potentially reproducible way of exposing bugs in the server that only manifest as a result of non-standard system conditions i.e. extremely slow network, disk, CPU.
To achieve the above two goals, we would likely need to determine which of our tests are to be considered "timing-agnostic", and run only those tests on the slow variants, so that we don't produce extra noise from tests that are timing dependent. If these "timing-agnostic" tests truly are valid tests, then they should never fail due to criteria 1 and 2 noted above. That is, they are not flaky, and there are no timing dependent bugs that they would ever expose.
If these slow variants were also integrated into patch build workflows from an early stage, they could act as an extra guard against tests that may introduce flakiness or intermittent failures into the Evergreen master branch.