Details
-
Bug
-
Resolution: Fixed
-
Major - P3
-
None
-
None
Description
I think two things are happening
1. We heartbeat and connect/dial with a background context which takes a long time
on a paused cluster which doesn't refuse connections immediately.
2. We only stop doing this based on <-done in a 50/50 chance select two times that
comes from a disconnect so I think it's very possible to get unlucky for a long time
over many repeated requests.
Note: uncomment the code in connection.go that is a single select statement to see
how disconnect speeds up.
Attached repro