Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-87423

Add Local Failover in Bazel when EngFlow Remote execution fails

    • Type: Icon: Improvement Improvement
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Build
    • 151

      We need a failover option when EngFlow has issues

      1. We discussed the option of trying a local build when remote execution fails, this seems like the best option even though it has drawbacks
      2. We need to investigate if we can differentiate between a build correctness failure (ex. compiler error) and a remote infra issue
      3. We don't believe we can distinguish, so we'll probably need to just always retry locally if anything ever fails. This has the drawback of longer delays on build failures and potentially more confusing error logs, but is probably better to be safe on availability
      4. This could cause issues with large slowdowns since local execution is going to be very slow. There isn't really a way around this other than putting up our own backup remote cache system, which is likely prohibitively expensive (although maybe we can do it for critical variants?)
      5. We need to make sure we have active alerting setup to notify when there is a discrepancy between local and remote build success. We don't want remote builds to be failing silently and have everything slowdown without us knowing about it

       

      This ticket covers the work for always retrying a bazel invocation in local-mode when the first remote invocation fails, and adding a loud alert mechanism for when a remote build fails, but a local build succeeds.

            Assignee:
            Unassigned Unassigned
            Reporter:
            zack.winter@mongodb.com Zack Winter
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: