We currently have several different groups of utility threads including:
- LSM manager threads
- Eviction worker threads
- Async API threads
- Logging threads
Each group of threads uses a different flag to determine whether the system is running, and the state of those flags is maintained entirely in the subsystem i.e: there is no single place that can shutdown utility threads. That makes sense in an orderly shutdown - since there is a defined order in which shutdown must happen. It leaves us open to cases where server threads exit unexpectedly, their "child" threads may never exit, because the flag they use to continue running is never reset.
We could make this more robust in several ways including:
- Add a global flag indicating that all threads should shutdown immediately, which would be enabled after orderly shutdown of subsystems is completed.
- Switch all usages of thread groups to the thread group code, and ensure consistent and correct handling.
- Ensure that any time a server thread exits unexpectedly it sets WT_PANIC, and that utility threads check for WT_PANIC regularly.