SERVER-33187 a customer database stopped archiving log files (on a Windows system). The pre-allocation of log files also stopped. FTDC data indicated that everything else seemed normal and checkpoints were happening normally. Nothing was out of the ordinary. I determined that the reason was that the log server thread exited and found its error message in the logs.
The system kept running but there was now no thread to ever perform the tasks of that thread. So WT log files kept accumulating forever. This was a Windows system, and an attempt to remove a log file got Access Denied/EPERM.
We should review internal thread error path handling. Perhaps any internal thread error should be fatal and cause a panic. Or internal threads should handle errors more specifically and perhaps retry for some potentially transient errors.
- is duplicated by
SERVER-35842 Server stopped on wiredtiger log access
- is related to
SERVER-33187 Journal data are not cleared by WiredTiger