[SERVER-27251] setup_multiversion_mongodb.py should retry in the case of failures. Created: 01/Dec/16  Updated: 06/Dec/17  Resolved: 18/Oct/17

Status: Closed
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: None
Fix Version/s: 3.6.0-rc1

Type: Improvement Priority: Major - P3
Reporter: Sam Kleinman (Inactive) Assignee: Jonathan Abrahams
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
related to SERVER-28401 Use Python's requests package in setu... Closed
Backwards Compatibility: Fully Compatible
Sprint: TIG 2017-10-23
Participants:
Linked BF Score: 0

 Description   

We occasionally see issues with the downloads.mongodb.org that resolve after a few seconds, due to ephemeral issues at amazon that resolve quickly. We might consider adding some retry logic (potentially with backoff/jitter) to the setup_multiversion_mongodb.py script to reduce spurious failures.



 Comments   
Comment by Githook User [ 18/Oct/17 ]

Author:

{'email': 'jonathan@mongodb.com', 'name': 'Jonathan Abrahams', 'username': 'hptabster'}

Message: SERVER-28403 setup_multiversion_mongodb.py looks for latest when downloading Major.minor
SERVER-27251 setup_multiversion_mongodb.py should retry in the case of failures
SERVER-28401 setup_multiversion_mongodb.py uses requests package for downloads
Branch: master
https://github.com/mongodb/mongo/commit/18d5b0cea7558f88bbd5dcbec2a762b51cb13c98

Comment by Jonathan Abrahams [ 18/Oct/17 ]

Work for this ticket was subsumed in SERVER-28403

Comment by Zakhar Kleyman [ 21/Feb/17 ]

According to curl manpages

--retry <num>

If a transient error is returned when curl tries to perform a transfer, it will retry this number of times before giving up. Setting the number to 0 makes curl do no retries (which is the default). Transient error means either: a timeout, an FTP 4xx response code or an HTTP 5xx response code.

In our case we get curl error 18

CURLE_PARTIAL_FILE (18)

A file transfer was shorter or larger than expected. This happens when the server first reports an expected transfer size, and then delivers data that doesn't match the previously given size.

I don't think that it's a transient error and therefore curl doesn't retry.

Comment by Sam Kleinman (Inactive) [ 05/Dec/16 ]

Just to be clear, curator does not currently have back off and retry logic for downloading artifacts (it does for syncing repos during repobuilding,) but we could definitely add this kind of back-off to artifact downloading.

Comment by Ernie Hershey [ 05/Dec/16 ]

+1 on using curator in the server project and removing the logic from setup_multiversion_mongodb.py if we're going to use it in other projects.

Comment by Sam Kleinman (Inactive) [ 02/Dec/16 ]

I'm not sure how curl's retry system is implemented. I often see failures against s3 even with retries unless there's some back off or delay (the conventional wisdom is that adding a little jitter also helps.) It might be the case that we need to try more, or implement our own retry logic with a back off approach.

We're also in the process of deploying the curator tool (which we use in the server project for packaging) to the build machines, as it has recently acquired mongodb artifact downloading support, which might be a better place to store this logic long term. If fixing this isn't super high priority (which I don't think it should be given the frequency of these issues), we could migrate to using this tool.

Comment by Max Hirschhorn [ 01/Dec/16 ]

sam.kleinman, doesn't the setup_multiversion_mongodb.py script already have retry logic via the --retry option to curl?

proc = subprocess.Popen(["curl",
                            "-L", "--silent",
                            "--retry", "5",
                            "--retry-max-time", "600",
                            "--max-time", "120",
                            "-o", file_name,
                            url],
                            stdout=subprocess.PIPE, stderr=subprocess.PIPE)

Generated at Thu Feb 08 04:14:36 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.