[SERVER-36162] Powercycle - ensure internal crash command has been executed on the remote host Created: 17/Jul/18 Updated: 29/Oct/23 Resolved: 13/Sep/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Testing Infrastructure |
| Affects Version/s: | None |
| Fix Version/s: | 3.6.9, 4.0.3, 4.1.3 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Jonathan Abrahams | Assignee: | Jonathan Abrahams |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | tig-powercycle | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Backport Requested: |
v4.0, v3.6
|
||||||||
| Sprint: | TIG 2018-09-24 | ||||||||
| Participants: | |||||||||
| Linked BF Score: | 0 | ||||||||
| Story Points: | 5 | ||||||||
| Description |
|
It's possible that due to an ssh connection error, the remote command to internally crash a server will never run. The powertest.py script expects that the crash command will fail, as the ssh connection will be terminated. However, it should examine the output of the crash command to determine it it was actually run on the remote host. Here's a case where the remote command failed to execute:
|
| Comments |
| Comment by Githook User [ 21/Sep/18 ] |
|
Author: {'name': 'Jonathan Abrahams', 'email': 'jonathan@Jonathans-MacBook-Pro.local'}Message: (cherry picked from commit f4d62c2ba9a27dc03663779d0817bc399ab2e91f) |
| Comment by Githook User [ 20/Sep/18 ] |
|
Author: {'name': 'Jonathan Abrahams', 'email': 'jonathan@Jonathans-MacBook-Pro.local'}Message: (cherry picked from commit f4d62c2ba9a27dc03663779d0817bc399ab2e91f) |
| Comment by Githook User [ 13/Sep/18 ] |
|
Author: {'name': 'Jonathan Abrahams', 'email': 'jonathan@Jonathans-MacBook-Pro.local'}Message: |
| Comment by Max Hirschhorn [ 31/Jul/18 ] |
|
remote_operations.py and thus powercycle have no way to distinguish whether the errors that come back from a remote command are from SSH itself or from the commands being run through SSH. In order to more tightly handle whether or not we want to retry on SSH errors, we need to build logic to be able to detect what the source of the error is. |
| Comment by Jonathan Abrahams [ 23/Jul/18 ] |
|
The output from the ssh is returned, as "Connection timed out during banner exchange", so we can retry on this. We need to ensure before running the next loop in the powertest.py (for the server crash scenarios) that the server has been restarted (by examining the uptime). |
| Comment by Max Hirschhorn [ 22/Jul/18 ] |
jonathan.abrahams, isn't it possible that the client won't even observe the output of the crash command because it has been disconnected from the remote host as part of running the crash command? It isn't clear to me the kind of change you are proposing to make to powertest.py. Separately, should we add "Connection timed out during banner exchange" to this list of ssh errors that remote_operations.py knows to retry on? |