Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-35724

Remote EC2 hosts which are not accessible via ssh should fail with system error

    XMLWordPrintable

    Details

    • Type: Task
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.6.7, 4.0.1, 4.1.1
    • Component/s: None
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Backport Requested:
      v4.0, v3.6
    • Sprint:
      TIG 2018-07-02
    • Linked BF Score:
      31

      Description

      When a remote EC2 instance is "crashed" by the powercycle test it sometimes fails to become available via ssh. The AWS status still indicates it as "running". The work in SERVER-34996 is intended to help analyze why this may occur.

      The following should be done such that we can distinguish between a test failure (possible data corruption) and an environment failure:

      • powertest.py should exit with a way to indicate it failed due to ssh
      • If the exit is due to ssh, then a system failure should be triggered (which will show the task as purple)

      In order to help find out why a particular EC2 instance is failing to permit ssh we should also do the following:

      • Termination of the EC2 instance should not be attempted if a system failure occurred due to ssh issue from powercycle (for non-Windows variants)
      • Increase the expire_hours to 24 (for non-Windows variants)

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: