SERVER-59366 provides basic unit tests, we need integration test with fail injection that makes a health check to be stuck triggering crash in the mongos. We can define a failpoint to stop the HealthObservers from executing checks such that the statistics used by the progress monitor are not updated.
- 2 mongos servers
- Workload that performs write operations in a loop
- Start workload.
- Wait a bit
- Enable the failpoint on one of the mongos processes
- Let FaultManagerConfig::getPeriodicLivenessDeadline elapse while the workload is running
- Observe that operations are redirected to the other mongos
- Observe that writes are accounted for before and after we enabled the failpoint