Currently when there is a retryable error, for example, the cluster is scanned synchronously on the main thread. This is the same behavior that was eliminated in
RUBY-1357 during client construction and suffers from the same problems:
- Rescan is done on all servers, main thread is blocked while each server is queried even though topology may already have identified a writable server.
- Servers are scanned sequentially.
- There is no time bound on the rescan.
A solution similar to
RUBY-1357 needs to be implemented for all of the other rescans. One way to do this I imagine is to request an immediate server recheck for all servers and then wait up to server selection timeout to obtain a usable server.
One difficulty here is the monitor thread is currently not designed to be interruptible - it unconditionally sleeps for the heartbeat interval time. This will need to be changed.