Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-32121

resmoke.py should cause the Evergreen task to system fail if the EC2 instance is terminated

    • Fully Compatible
    • TIG 2017-12-18

      The changes from EVG-1222 added an event to the "host logs" to indicate if the EC2 instance was terminated. This has been known to happen (most often on the Windows 2008R2 DEBUG builder) due to increases in the spot price. When the EC2 instance is terminated, a CTRL_SHUTDOWN_EVENT is sent to the mongod processes causing them to exit and for tests communicting with those servers to fail. Additionally, an IOError "Interrupted function call" exception is raised to indicate that waiting for the test queue to become empty was interrupted. This leads to a race where (1) resmoke.py may exit with a nonzero code and (2) the Evergreen agent runs the attach.results command to indicate the Evergreen task has failed prior to the EC2 being terminated.

      We should instead capture the IOError exception and have resmoke.py exit with a specific return code. The "run tests" function in etc/evergreen.yml should then check that return code and defer causing the task to fail in a subsequent shell.exec command of type=system. See here for how this is done with existing Jepsen tasks.

            Assignee:
            kevin.albertson@mongodb.com Kevin Albertson
            Reporter:
            max.hirschhorn@mongodb.com Max Hirschhorn
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: