For a client pool, the background topology scanner runs a complete scan of all servers after heartbeatFrequencyMS has passed (or sooner, if a scan is requested).
The background scan uses mongoc_topology_scan_once. This fans out "ismaster" commands and waits for all responses before another scan can be scheduled.
A big problem with this, is that a slow server could block the next scheduled scan of all other servers. The timeout of an "ismaster" in scanning is connectTimeoutMS, which may exceed heartbeatFrequencyMS. This scenario can easily happen:
1. Scan requested
2. "ismaster" is sent to servers X and Y
3. X responds quickly, but Y hangs for connectTimeoutMS.
4. The "ismaster" to X times out.
5. The background thread sees that more than heartbeatFrequencyMS has passed and starts a new complete scan.
I've reproduced this behavior by modifying example-sdam-monitoring.c. It overrides the stream initializer to simulate a slow connection to one server.
Running it against a two node replica set shows the behavior:
The second heartbeat to localhost:27017 is blocked by the 20 second connection timeout.
This behavior is unavoidable for single-threaded scans, but should not be the case for multi-threaded scans. Servers should be scanned at their own intervals (which also better aligns with the server monitoring spec).