-
Type: Improvement
-
Resolution: Fixed
-
Priority: Unknown
-
None
-
Component/s: Astrolabe
-
None
-
Not Needed
Summary
Some exceptions that can happen during Astrolabe setup are not recognized as part of setup and are registered as task failures instead of setup failures. Update Astrolabe exception handling to catch all exceptions that can happen during setup and defer handling those errors to the "check-cloud-failure" command that runs after the main test run.
Detailed Description
The Atlas cluster setup happens during the run-one command, which is configured as type "test" in the Evergreen config (displays as "test failure"). It seems like we really want to check for cloud failure in the check-cloud-failure command, which is configured as type "setup" in the Evergreen config (displays as "setup failure"). Deferring cloud setup failure checking depends on this try/except block in the runner, which expects a very specific set of exception types and error messages. However, some of those HTTP timeout exceptions are thrown from HTTP calls in _init_ functions (e.g. here) and aren't caught by the try/except block.
The exception handling block that attempts to defer errors caused by cloud setup to a following "setup"-type Evergreen command doesn't handle a lot of possible setup exceptions. We need to refactor the exception handling logic to catch all exceptions that can happen during initialization and cluster setup (e.g. by moving the try/except block to where the runner is initialized and called here).
Motivation
Who is the affected end user?
People supporting Astrolabe. Possibly driver devs who are erroneously notified about drivers failures.
How does this affect the end user?
An Astrolabe build will fail with a task failure instead of a setup failure.
How likely is it that this problem or use case will occur?
Reasonably likely, especially if deployments to the "Cloud QA" Atlas environment are happening during an Astrolabe run.
If the problem does occur, what are the consequences and how severe are they?
It is confusing to someone trying to debug the Astrolabe build failure.
Is this issue urgent?
No.
Is this ticket required by a downstream team?
No.
Is this ticket only for tests?
Yes.