Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-35724

Remote EC2 hosts which are not accessible via ssh should fail with system error

    XMLWordPrintableJSON

Details

    • Task
    • Status: Closed
    • Major - P3
    • Resolution: Fixed
    • None
    • 3.6.7, 4.0.1, 4.1.1
    • None
    • Fully Compatible
    • v4.0, v3.6
    • TIG 2018-07-02
    • 31

    Description

      When a remote EC2 instance is "crashed" by the powercycle test it sometimes fails to become available via ssh. The AWS status still indicates it as "running". The work in SERVER-34996 is intended to help analyze why this may occur.

      The following should be done such that we can distinguish between a test failure (possible data corruption) and an environment failure:

      • powertest.py should exit with a way to indicate it failed due to ssh
      • If the exit is due to ssh, then a system failure should be triggered (which will show the task as purple)

      In order to help find out why a particular EC2 instance is failing to permit ssh we should also do the following:

      • Termination of the EC2 instance should not be attempted if a system failure occurred due to ssh issue from powercycle (for non-Windows variants)
      • Increase the expire_hours to 24 (for non-Windows variants)

      Attachments

        Issue Links

          Activity

            People

              jonathan.abrahams Jonathan Abrahams
              jonathan.abrahams Jonathan Abrahams
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: