[SERVER-4731] Removed replica set nodes should not appear as members of the replica set. Created: 20/Jan/12  Updated: 09/Nov/12  Resolved: 16/Feb/12

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.0.2
Fix Version/s: 2.1.1

Type: Bug Priority: Major - P3
Reporter: Kyle Banker Assignee: Eric Milkie
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
is related to SERVER-5058 mongos should update config seed base... Closed
Operating System: ALL
Participants:

 Description   

When you call rs.remove(), the removed node still looks a lot like a replica set.

  • It still responds to rs.status().
  • It claims to be syncing from a node in the original replica set.
  • It has a state of STARTUP.
  • It has the same name as the old replica set.

To avoid confusion on the client side, removed nodes should immediately terminate.



 Comments   
Comment by Eric Milkie [ 16/Feb/12 ]

need to add new state to the state table in "Replica Set Commands"

Comment by auto [ 14/Feb/12 ]

Author:

{u'login': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-4731 use simpler english
Branch: master
https://github.com/mongodb/mongo/commit/5984ac837f0fab4eeac4c72e933c2685939e65de

Comment by auto [ 14/Feb/12 ]

Author:

{u'login': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-4731 add new state for removed replicaset members

New behavior: when you remove a replica set node, its state changes
to SHUNNED until it is either shut down or readded to a replica set.
After it is removed, the removed node sees only itself in the
replica set configuration.
Branch: master
https://github.com/mongodb/mongo/commit/c0e0f9d07a1758075f5ac3196ab10c5d1aeece76

Comment by Eric Milkie [ 07/Feb/12 ]

Replica set nodes used to terminate when they were removed. This feature was added with 8a0def48 and removed with 759e1e646 .
To support the use case of removing a node and then readding it (without shutting down the node!), I'll be looking at cleaning up the state transition.

Comment by Gustavo Niemeyer [ 30/Jan/12 ]

I spent some time on this scenario over the weekend. Some related comments:

1. After the master connection drops, there's a window of time when
the driver can connect to the removed secondary node and still get an
ismaster command result saying it's a secondary, so the ismaster=false
+ secondary=false may be a reliable indicator of whether a server must
be disregarded as a cluster member, but it's not enough to say that
the server is part of the cluster ("if", rather than "iff").

2. As expected the list of replica set members returned by the master
is immediately correct once a reconnection succeeds, so that seems to
be the best way to correctly fix the cluster topology after the
removal event.

3. It's curious that the primary drops all connections on the removal,
but the removed node itself does not.

I'm hoping to release an updated version of the Go driver with that
handled in a better fashion in the next few days including a proper
test verifying the driver (and server) behavior.

Comment by Gustavo Niemeyer [ 22/Jan/12 ]

For the record, as Kyle commented elsewhere the following scheme is also a workaround to detect when a host should be ignored for the moment:

  • len(hosts) == 1
  • ismaster == false
  • secondary == false
Comment by Kyle Banker [ 20/Jan/12 ]

If the node is not allowed to terminate, it should at the very least revert to a state when the replica set has not been initiated yet. That is, the rs.status() and db.isMaster() commands should never indicate that the node is part of a replica set.

Generated at Thu Feb 08 03:06:48 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.