Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-32121

resmoke.py should cause the Evergreen task to system fail if the EC2 instance is terminated

    XMLWordPrintable

    Details

    • Backwards Compatibility:
      Fully Compatible
    • Sprint:
      TIG 2017-12-18

      Description

      The changes from EVG-1222 added an event to the "host logs" to indicate if the EC2 instance was terminated. This has been known to happen (most often on the Windows 2008R2 DEBUG builder) due to increases in the spot price. When the EC2 instance is terminated, a CTRL_SHUTDOWN_EVENT is sent to the mongod processes causing them to exit and for tests communicting with those servers to fail. Additionally, an IOError "Interrupted function call" exception is raised to indicate that waiting for the test queue to become empty was interrupted. This leads to a race where (1) resmoke.py may exit with a nonzero code and (2) the Evergreen agent runs the attach.results command to indicate the Evergreen task has failed prior to the EC2 being terminated.

      We should instead capture the IOError exception and have resmoke.py exit with a specific return code. The "run tests" function in etc/evergreen.yml should then check that return code and defer causing the task to fail in a subsequent shell.exec command of type=system. See here for how this is done with existing Jepsen tasks.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              kevin.albertson Kevin Albertson
              Reporter:
              max.hirschhorn Max Hirschhorn
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: