[SERVER-49928] Add more diagnostics to powercycle tests. Specifically around issuing commands via ssh Created: 27/Jul/20  Updated: 29/Oct/23  Resolved: 23/Feb/21

Status: Closed
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: None
Fix Version/s: 4.9.0

Type: Improvement Priority: Major - P3
Reporter: Daniel Gottlieb (Inactive) Assignee: Vlad Rachev (Inactive)
Resolution: Fixed Votes: 0
Labels: tig-powercycle
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Backwards Compatibility: Fully Compatible
Sprint: STM 2020-12-14, STM 2021-02-08, STM 2021-02-22, STM 2021-03-08
Participants:
Linked BF Score: 0
Story Points: 0

 Description   

There was a test failure due to a pam_nologin failure when issuing a command over ssh. Curiously, there seemed to have been multiple successful ssh commands since the last reboot prior to the failure. A good start might be to enable the existing debug flag. Some details that would be useful understanding the failure:

  • What login name does the command get sent as?
  • Emitting a message on each ssh attempt and retry
  • Assert that a pam_nologin error is something the ssh code retries on.

Additionally, if there's a way to track the changes to the pam_nologin file, that might uncover some false assumptions.



 Comments   
Comment by Githook User [ 22/Feb/21 ]

Author:

{'name': 'vrachev', 'email': 'vlad.rachev@mongodb.com', 'username': 'vrachev'}

Message: SERVER-49928 Clean up powercycle logging
Branch: master
https://github.com/mongodb/mongo/commit/914e04e031fba864dc9bf139bbb96abd91871343

Comment by Brooke Miller [ 24/Nov/20 ]

robert.guo mentioned we should start with the implementation of the diagnostic tasks that Dan mentioned above to kick off the powercycle project 

Comment by Brooke Miller [ 30/Jul/20 ]

Time-box this investigation to 1 day.

Generated at Thu Feb 08 05:21:14 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.