Details
-
Bug
-
Resolution: Done
-
Major - P3
-
None
-
None
-
None
-
None
Description
Imagine the following scenario:
1. You have a three-node replica set with members at localhost:30000,
localhost:30001, and localhost:30002, where the last of these is the
primary.
2. The seed list as the driver knows it starts with the node at localhost:30000.
3. You run rs.remove('localhost:30000').
4. You step down the current primary to force the driver to reconnect.
In this situation, the Ruby driver would never reconnect. The reason
is that when the Ruby driver tries to connect, it does the following:
1. Iterates through the seed list until it can successfully connect to
a node. In this case, that node was the one we removed
('localhost:30000').
2. Runs isMaster and verifies that the node has a 'hosts' field and
that it has the expected replica set name.
3. Iterates through the hosts list attempting to find a primary node.
In this case, there was just one hosts in the list ('localhost:30000')
and it was not a primary node.
The issue is that the removed node still sees itself as a kind of
replica set, albeit a crippled one. I've created a server ticket to
address this issue. You'll see some interesting details here:
https://jira.mongodb.org/browse/SERVER-4731
The solution to the problem for now is to recognize
that a removed node is not part of a healthy replica set. If the node
specifies just one host and is neither a primary nor a secondary, all
of which can be deduced from the isMaster command, then you're safe in
concluding that an in moving on to the next seed. More succinctly:
- Number of hosts == 1
- ismaster == false
- secondary == false