[SERVER-42615] Run chkdsk command on Windows after each powercycle loop Created: 02/Aug/19 Updated: 29/Oct/23 Resolved: 02/Aug/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Testing Infrastructure |
| Affects Version/s: | None |
| Fix Version/s: | 4.2.1, 4.3.1 |
| Type: | New Feature | Priority: | Major - P3 |
| Reporter: | Max Hirschhorn | Assignee: | Max Hirschhorn |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | tig-powercycle | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||
| Backport Requested: |
v4.2
|
||||||||||||||||||||
| Sprint: | STM 2019-08-12 | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Linked BF Score: | 26 | ||||||||||||||||||||
| Story Points: | 2 | ||||||||||||||||||||
| Description |
|
We've seen a variety of errors during powercycle testing on Windows after upgrading to Windows Server 2016, none of which are indicative of a MongoDB issue:
We should run the chkdsk command in read-only mode (i.e. without any extra parameters) to see if we can collect diagnostics indicating the NTFS volume is corrupt after using notmyfault.exe to crash the machine. |
| Comments |
| Comment by Githook User [ 13/Aug/19 ] |
|
Author: {'name': 'Max Hirschhorn', 'username': 'visemet', 'email': 'max.hirschhorn@mongodb.com'}Message: (cherry picked from commit e6ef0ca20e99b2b3a6682952c2588e6e2d1ba8a9) |
| Comment by Max Hirschhorn [ 05/Aug/19 ] |
|
jonathan.reams had another theory that we're simply not waiting for the data before starting the first powercycle loop (e.g. the mongod.exe executable that was scp'd over) to have been durably written to disk. mark.benvenuto had mentioned that the "Error performing inpage operation" message means we tried to call CreateProcess() on a binary that couldn't be fully read from disk (i.e. an unrecoverable page fault error) so that at least fits with the theory. https://docs.microsoft.com/en-us/sysinternals/downloads/sync looks to be a utility we can use to ensure the contents of the C:, D:, and E: drives are all flushed if we see more failures. |
| Comment by Githook User [ 02/Aug/19 ] |
|
Author: {'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}Message: |