-
Type: Improvement
-
Resolution: Done
-
Priority: Minor - P4
-
None
-
Component/s: Astrolabe
-
None
-
Not Needed
Summary
Astrolabe tests fail intermittently. These failures can often be transient and a retry could absolve us of the error. Implement retry logic for our api calls.
Example Stacktrace:
[2024/01/25 23:10:26.073] Traceback (most recent call last): [2024/01/25 23:10:26.073] File "Z:\data\mci\887352df31b77292fa8f21d33afe9c8c\astrolabe-src\atlasclient\client.py", line 229, in request [2024/01/25 23:10:26.073] response = requests.request(method, url, **request_kwargs) [2024/01/25 23:10:26.073] File "Z:\data\mci\887352df31b77292fa8f21d33afe9c8c\astrolabe-src\astrolabevenv\lib\site-packages\requests\api.py", line 59, in request [2024/01/25 23:10:26.073] return session.request(method=method, url=url, **kwargs) [2024/01/25 23:10:26.073] File "Z:\data\mci\887352df31b77292fa8f21d33afe9c8c\astrolabe-src\astrolabevenv\lib\site-packages\requests\sessions.py", line 589, in request [2024/01/25 23:10:26.073] resp = self.send(prep, **send_kwargs) [2024/01/25 23:10:26.073] File "Z:\data\mci\887352df31b77292fa8f21d33afe9c8c\astrolabe-src\astrolabevenv\lib\site-packages\requests\sessions.py", line 703, in send [2024/01/25 23:10:26.073] r = adapter.send(request, **kwargs) [2024/01/25 23:10:26.073] File "Z:\data\mci\887352df31b77292fa8f21d33afe9c8c\astrolabe-src\astrolabevenv\lib\site-packages\requests\adapters.py", line 532, in send [2024/01/25 23:10:26.073] raise ReadTimeout(e, request=request) [2024/01/25 23:10:26.073] requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='cloud-qa.mongodb.com', port=443): Read timed out. (read timeout=30.0)
Motivation
Who is the affected end user?
Astrolabe maintainers
How does this affect the end user?
Are they blocked? Are they annoyed? Are they confused?
Reduces noisy signal
How likely is it that this problem or use case will occur?
Main path? Edge case?
Main path.
If the problem does occur, what are the consequences and how severe are they?
Minor annoyance at a log message? Performance concern? Outage/unavailability? Failover can't complete?
Is this issue urgent?
Does this ticket have a required timeline? What is it?
No
Is this ticket required by a downstream team?
Needed by e.g. Atlas, Shell, Compass?
No
Is this ticket only for tests?
Does this ticket have any functional impact, or is it just test improvements?
Yes
Acceptance Criteria
What specific requirements must be met to consider the design phase complete?
Add retry logic to wrap our API calls.