[SERVER-54898] Run the sync utility after initial powercycle setup on Windows Created: 02/Mar/21  Updated: 11/Mar/21  Resolved: 11/Mar/21

Status: Closed
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Robert Guo (Inactive) Assignee: Mikhail Shchatko
Resolution: Won't Fix Votes: 0
Labels: tig-powercycle
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-42615 Run chkdsk command on Windows after e... Closed
is related to SERVER-54851 Remove powercycle tasks with internal... Closed
Sprint: STM 2021-03-22
Participants:
Story Points: 2

 Description   

One possibility for the stability issues on Windows we're seeing is that the files are not flushed to disk correctly. We can correct this by running the "sync" utility.

See Max's comment here for more info.



 Comments   
Comment by Robert Guo (Inactive) [ 11/Mar/21 ]

Filed BUILD-12883 to try to look a the issue further with the BUILD team.

Comment by Mikhail Shchatko [ 11/Mar/21 ]

Running sync utility didn't help to get powercycle on Windows more stable. We are still getting ssh connection issues after server crash:
https://spruce.mongodb.com/version/6047551c32f41751df136ccd/tasks

Meanwhile found the steps to repro ssh connection issue:

  1. Spawn windows host
  2. Download notmyfault
  3. Run notmyfaultc64.exe -accepteula crash 1 command that we use in powercycle
  4. After a while I couldn't connect into the host:

    $ ssh Administrator@ec2-18-203-134-37.eu-west-1.compute.amazonaws.com
    ssh: connect to host ec2-18-203-134-37.eu-west-1.compute.amazonaws.com port 22: Connection refused
    

    Also tried to run shutdown /r /f /t 0 command instead of notmyfault and got the same result.

Most likely in some cases setup script (or aws user data script) is not running on server startup after server crash/shutdown.
From aws docs:

By default, the user data scripts are run one time when you launch the instance. To run the user data scripts every time you reboot or start the instance, add <persist>true</persist> to the user data.

We have this in our user data, but it's still not running sometimes.

Generated at Thu Feb 08 05:34:49 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.