[SERVER-31112] Create dedicated "slow" machine evergreen variants Created: 15/Sep/17 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Testing Infrastructure |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Minor - P4 |
| Reporter: | William Schultz (Inactive) | Assignee: | Backlog - Server Tooling and Methods (STM) (Inactive) |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | stm | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Assigned Teams: |
Server Tooling & Methods
|
| Participants: |
| Description |
|
For teams like sharding and replication, Javascript test reliability is often dependent on the hardware speed of the machines they run on, since there are multiple nodes that are in communication with each other. In the past, slow machines have exposed either flaky tests or actual bugs that wouldn't manifest themselves on a high performance machine. It seems that having a dedicated set of variants, with controllable levels of "slowness" could be a useful part of our test infrastructure. The slowness parameters could include disk, network, CPU, etc, with potentially separate variants for different types. This could expose our system and tests to varying types of stress that may or may not be explicitly tested currently. Tests which are dependent on timing and machine speed could likely be ignored by such a variant. The main goals of these "slow" variants would be the following: 1. Expose Test Flakiness: Provide stronger and more explicit verification that tests aren't "flaky". That is, tests that shouldn't be dependent on machine speed should not fail due to a machine speed issue. To achieve the above two goals, we would likely need to determine which of our tests are to be considered "timing-agnostic", and run only those tests on the slow variants, so that we don't produce extra noise from tests that are timing dependent. If these "timing-agnostic" tests truly are valid tests, then they should never fail due to criteria 1 and 2 noted above. That is, they are not flaky, and there are no timing dependent bugs that they would ever expose. If these slow variants were also integrated into patch build workflows from an early stage, they could act as an extra guard against tests that may introduce flakiness or intermittent failures into the Evergreen master branch. |
| Comments |
| Comment by Steven Vannelli [ 10/May/22 ] |
|
Moving this ticket to the Backlog and removing the "Backlog" fixVersion as per our latest policy for using fixVersions. |
| Comment by William Schultz (Inactive) [ 15/Sep/17 ] |
|
max.hirschhorn Do the above goals seem rational to you? |
| Comment by Max Hirschhorn [ 15/Sep/17 ] |
|
Scott had an idea around doing this in |