-
Type: Epic
-
Resolution: Won't Do
-
Priority: Unknown
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
-
Done
-
Speed up patch builds
Summary
Various Go driver test variants currently take 10-15 minutes to complete. A full PR patch build typically takes 25-30 minutes to complete. As we add more tests, our sensitivity to test timeouts will increase and the timeliness of patch build results will decrease. We need to find a way to allow increasing test coverage without further increasing the runtime of the tests. Some possible improvements:
- Scale up test runner hardware to use faster CPUs.
- Scale up local MongoDB clusters to use more replicas or shards to decrease operation durations.
- Scale up test Atlas clusters to use more replicas or shards to decrease operation durations.
- Run more tests in parallel to reduce overall test run duration.
Motivation
Who is the affected end user?
The Go Driver team is the primary group affected.
How does this affect the end user?
The Go Driver team currently spends time waiting for Evergreen patch builds, troubleshooting test failures caused by timeouts, and increasing test timeouts.
How likely is it that this problem or use case will occur?
We've opened two tickets related to tests consistently failing in the last 6 months:
- https://jira.mongodb.org/browse/GODRIVER-1762
- https://jira.mongodb.org/browse/GODRIVER-2070
Additionally, the duration of the slowest patch variant has increased by about 50% in the last 6 months (as documented byGODRIVER-1762increasing the test timeout to 30 minutes from 10 minutes; now test variants regularly take 15 minutes).
If the problem does occur, what are the consequences and how severe are they?
If a test times out, the resolution is usually to run it again and hope it runs faster. If that doesn't work, someone must investigate the root cause of the timeout, which may end up being that "the tests just take a long time." If the tests timed out because they're just slow, someone needs to update the test configuration to allow a longer timeout.
If the tests take a long time but don't timeout, Go Driver team members are stuck waiting for test results, possibly for hours or up to days depending on how many times unreliable tests must be run.
Is this issue urgent?
There are currently some consistent test timeouts on MacOS (GODRIVER-2070), although we don't know if the root cause of that problem is a specific problem or just general test slowness.
Is this ticket required by a downstream team?
No.
Is this ticket only for tests?
Test and development efficiency improvements.
Cast of Characters
Engineering Lead: ?
Document Author: matt.dale
POCers: ?
Product Owner: ?
Program Manager: ?
Stakeholders: Go Driver team