After our meeting to discuss file corruption, it occurs to me that we could provide a mode to be able to restart if a WT log file is corrupted. To the best of my knowledge, log files that end up corrupted (often due to not running file system integrity on restart) are almost always the last log file created. (I'm not aware of any that were in the middle of the data stream.)
We can consider adding a form of catastrophic recovery in some cases. WT can detect if the corrupted file is the last one from the previous run, or if restart has been attempted multiple times (because WT creates the next log file right away) whether all log files after the corrupted one are empty of data records. If so, instead of failing upon reading the corrupted log files, WT could simply truncate the log and run recovery to the end of the last good log file.
The user may lose any data, if any, that was written into the corrupted log. And this would only work if checkpoints were all before any LSN in the corrupted log. But it is a path to getting the user's database back up and running and is often the manual steps taken/suggested by support.
- is related to
SERVER-36633 Use WiredTiger log file salvage to recover a corrupted journal