Uploaded image for project: 'Evergreen'
  1. Evergreen
  2. EVG-17316

Enable rebuilding failed tasks automatically

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major - P3
    • Resolution: Unresolved
    • None
    • Backlog
    • product
    • None

    Description

      When testing Kubernetes Operators in multi-cloud scenarios, it happens from time to time that our tasks fail due to issues out of our control - networking, timeouts, slow Kubernetes cluster nodes etc. We'd like to request building an automatic rebuild capability to Evergreen. This would significantly increase our productivity (as currently we re-trigger those jobs manually via UI).

      Within the Kubernetes Team we discussed a few implementation ideas:

      • A Github hook listening on evergreen retry failed command. This scenario could be easily automated by the product teams as a Github Action for example.
      • A REST Endpoint on a Patch level that would calculate and re-trigger failed tasks.
      • Introduce necessary flags in evergreen.yml to make a task eligible for automatic rebuilds. This approach would help the teams to control their flaky tasks.

      It is highly recommended that there's some upper-bound for the number of automatic rebuilds. This would prevent the rebuilds from getting out of control.

      xref: https://mongodb.slack.com/archives/C0V896UV8/p1657782556658069

      Attachments

        Activity

          People

            backlog-server-evg Backlog - Evergreen Team
            sebastian.laskawiec@mongodb.com Sebastian Laskawiec
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: