-
Type:
Bug
-
Resolution: Done
-
Priority:
Major - P3
-
Affects Version/s: 2.2.16
-
Component/s: None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
If the Primary server stops responding over the TCP connection, but the connection itself is not terminated (which can happen in case of network problems), then the driver gets stuck and can not failover to the other members of the replica set. The problem is easily reproducible by running replica set locally and issuing a SIGSTOP to the primary server.
I have created a simple test application (with setup scripts included), that can be used to illustrate the problem:
https://github.com/OleksandrChekhovskyi/mongo-replset-test
Exact repro steps are described in the README file.
It seems that the problem is that ismaster ping is done sequentially for all servers, so if the first one gets stuck, it's never going to get the new configuration, because it keeps disconnecting/reconnecting.