[SERVER-35724] Remote EC2 hosts which are not accessible via ssh should fail with system error Created: 21/Jun/18 Updated: 29/Oct/23 Resolved: 28/Jun/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 3.6.7, 4.0.1, 4.1.1 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Jonathan Abrahams | Assignee: | Jonathan Abrahams |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | powercycle-infra | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Backport Requested: |
v4.0, v3.6
|
||||||||||||||||
| Sprint: | TIG 2018-07-02 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Linked BF Score: | 31 | ||||||||||||||||
| Description |
|
When a remote EC2 instance is "crashed" by the powercycle test it sometimes fails to become available via ssh. The AWS status still indicates it as "running". The work in The following should be done such that we can distinguish between a test failure (possible data corruption) and an environment failure:
In order to help find out why a particular EC2 instance is failing to permit ssh we should also do the following:
|
| Comments |
| Comment by Githook User [ 13/Jul/18 ] |
|
Author: {'username': 'hptabster', 'name': 'Jonathan Abrahams', 'email': 'jonathan@mongodb.com'}Message: |
| Comment by Githook User [ 09/Jul/18 ] |
|
Author: {'username': 'hptabster', 'name': 'Jonathan Abrahams', 'email': 'jonathan@mongodb.com'}Message: (cherry picked from commit ae29cbee182e41c10ca7b1a44e034f9e200a5b90) |
| Comment by Githook User [ 29/Jun/18 ] |
|
Author: {'username': 'hptabster', 'name': 'Jonathan Abrahams', 'email': 'jonathan@mongodb.com'}Message: |
| Comment by Jonathan Abrahams [ 26/Jun/18 ] |
|
We'll disable Amazon Linux 2 variant due to ssh connection issues after the machine has been internally "crashed". |
| Comment by Max Hirschhorn [ 22/Jun/18 ] |
|
jonathan.abrahams, there are still failures in the BF tickets linked to |
| Comment by Jonathan Abrahams [ 22/Jun/18 ] |
|
Given the findings that the host is not ssh accessible because it cannot boot (typically occurs on an Amazon Linux 2 instance) we do not need to not terminate the EC2 instance or increase it's expire_hours. It's not clear why we cannot boot, perhaps AWS does not support this scenario. |