[DRIVERS-2363] Catch all Astrolabe setup exceptions and mark them as setup failures instead of task failures Created: 17/Jun/22  Updated: 28/Oct/23  Resolved: 08/Sep/22

Status: Closed
Project: Drivers
Component/s: Astrolabe
Fix Version/s: None

Type: Improvement Priority: Unknown
Reporter: Matt Dale Assignee: Matt Dale
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Driver Changes: Not Needed

 Description   

Summary

Some exceptions that can happen during Astrolabe setup are not recognized as part of setup and are registered as task failures instead of setup failures. Update Astrolabe exception handling to catch all exceptions that can happen during setup and defer handling those errors to the "check-cloud-failure" command that runs after the main test run.

Detailed Description

The Atlas cluster setup happens during the run-one command, which is configured as type "test" in the Evergreen config (displays as "test failure"). It seems like we really want to check for cloud failure in the check-cloud-failure command, which is configured as type "setup" in the Evergreen config (displays as "setup failure"). Deferring cloud setup failure checking depends on this try/except block in the runner, which expects a very specific set of exception types and error messages. However, some of those HTTP timeout exceptions are thrown from HTTP calls in _init_ functions (e.g. here) and aren't caught by the try/except block.

The exception handling block that attempts to defer errors caused by cloud setup to a following "setup"-type Evergreen command doesn't handle a lot of possible setup exceptions. We need to refactor the exception handling logic to catch all exceptions that can happen during initialization and cluster setup (e.g. by moving the try/except block to where the runner is initialized and called here).
 

Motivation

Who is the affected end user?

People supporting Astrolabe. Possibly driver devs who are erroneously notified about drivers failures.

How does this affect the end user?

An Astrolabe build will fail with a task failure instead of a setup failure.

How likely is it that this problem or use case will occur?

Reasonably likely, especially if deployments to the "Cloud QA" Atlas environment are happening during an Astrolabe run.

If the problem does occur, what are the consequences and how severe are they?

It is confusing to someone trying to debug the Astrolabe build failure.

Is this issue urgent?

No.

Is this ticket required by a downstream team?

No.

Is this ticket only for tests?

Yes.



 Comments   
Comment by Githook User [ 08/Sep/22 ]

Author:

{'name': 'Matt Dale', 'email': '9760375+matthewdale@users.noreply.github.com', 'username': 'matthewdale'}

Message: DRIVERS-2363 Mark all Atlas test setup exceptions as 'cloud-failure'. (#148)
Branch: master
https://github.com/mongodb-labs/drivers-atlas-testing/commit/ff1f2fb495b8224eeaa57245320ddcc07bcd0e8a

Generated at Thu Feb 08 08:25:23 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.