[SERVER-27694] Uncertain behavior after rs.remove() Created: 16/Jan/17  Updated: 15/Nov/21  Resolved: 07/Apr/17

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 3.2.11
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Zhang Youdong Assignee: Mark Agarunov
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

Suppose a replica set has node1:27017(primary), node2:27017(secondary)

case1: when secondary restart,secondary will be in state "could not find member to sync from", it cannot choose a sync source because no data need sync.

{
	"set" : "mongo-9555",
	"date" : ISODate("2017-01-16T09:35:02.381Z"),
	"myState" : 1,
	"term" : NumberLong(-1),
	"heartbeatIntervalMillis" : NumberLong(2000),
	"members" : [
		{
			"_id" : 0,
			"name" : "node1:27017",
			"health" : 1,
			"state" : 1,
			"stateStr" : "PRIMARY",
			"uptime" : 11,
			"optime" : Timestamp(1484550472, 2),
			"optimeDate" : ISODate("2017-01-16T07:07:52Z"),
			"electionTime" : Timestamp(1484559293, 1),
			"electionDate" : ISODate("2017-01-16T09:34:53Z"),
			"configVersion" : 372974,
			"self" : true
		},
		{
			"_id" : 1,
			"name" : "node2:27017",
			"health" : 1,
			"state" : 2,
			"stateStr" : "SECONDARY",
			"uptime" : 10,
			"optime" : Timestamp(1484550472, 2),
			"optimeDate" : ISODate("2017-01-16T07:07:52Z"),
			"lastHeartbeat" : ISODate("2017-01-16T09:35:01.585Z"),
			"lastHeartbeatRecv" : ISODate("2017-01-16T09:35:01.512Z"),
			"pingMs" : NumberLong(0),
			"lastHeartbeatMessage" : "could not find member to sync from",
			"configVersion" : 372974
		}
	],
	"ok" : 1
}

case2: after write some data to primary, the secondary will choose a sync source successfully.

{
	"set" : "mongo-9555",
	"date" : ISODate("2017-01-16T09:41:31.490Z"),
	"myState" : 1,
	"term" : NumberLong(-1),
	"heartbeatIntervalMillis" : NumberLong(2000),
	"members" : [
		{
			"_id" : 0,
			"name" : "node1:27017",
			"health" : 1,
			"state" : 1,
			"stateStr" : "PRIMARY",
			"uptime" : 400,
			"optime" : Timestamp(1484559669, 2),
			"optimeDate" : ISODate("2017-01-16T09:41:09Z"),
			"electionTime" : Timestamp(1484559293, 1),
			"electionDate" : ISODate("2017-01-16T09:34:53Z"),
			"configVersion" : 372974,
			"self" : true
		},
		{
			"_id" : 1,
			"name" : "node2:27017",
			"health" : 1,
			"state" : 2,
			"stateStr" : "SECONDARY",
			"uptime" : 399,
			"optime" : Timestamp(1484559669, 2),
			"optimeDate" : ISODate("2017-01-16T09:41:09Z"),
			"lastHeartbeat" : ISODate("2017-01-16T09:41:29.680Z"),
			"lastHeartbeatRecv" : ISODate("2017-01-16T09:41:29.620Z"),
			"pingMs" : NumberLong(0),
			"lastHeartbeatMessage" : "syncing from node1:27017",
			"syncingTo" : "node1:27017",
			"configVersion" : 372974
		}
	],
	"ok" : 1
}

rs.remove("node2:27017") behave differently in the above two cases.

case1: node2 transition to REMOVED state, and cannot find a sync source.
case2: node2 transition to REMOVED, but continully to tail oplog from primary.

So what's the expected behavior when a node is removed from replica set?



 Comments   
Comment by Zhang Youdong [ 07/Feb/17 ]

Mark Agarunov

The problem is 『when the node in REMOVED state, it still sync data from PRIMARY 』, you can create a new database on PRIMARY, and you can see the new database on the removed node.

Comment by Mark Agarunov [ 03/Feb/17 ]

Hello zyd_com,

Unfortunately we've been unable to reproduce the behavior you've described. After removing the node without shutting it down first, the output I am seeing is:

Secondary:

Marks-MacBook-Pro(mongod-3.2.11) test> rs.status()
{
  "state": 10,
  "stateStr": "REMOVED",
  "uptime": 135,
  "optime": {
    "ts": Timestamp(1485976819, 1),
    "t": NumberLong("2")
  },
  "optimeDate": ISODate("2017-02-01T19:20:19Z"),
  "ok": 0,
  "errmsg": "Our replica set config is invalid or we are not a member of it",
  "code": 93
}

Primary:

Marks-MacBook-Pro(mongod-3.2.11)[PRIMARY:replset] test> rs.status()
{
  "set": "replset",
  "date": ISODate("2017-02-01T19:20:54.509Z"),
  "myState": 1,
  "term": NumberLong("2"),
  "heartbeatIntervalMillis": NumberLong("2000"),
  "members": [
    {
      "_id": 0,
      "name": "Marks-MacBook-Pro.local:27017",
      "health": 1,
      "state": 1,
      "stateStr": "PRIMARY",
      "uptime": 246,
      "optime": {
        "ts": Timestamp(1485976819, 1),
        "t": NumberLong("2")
      },
      "optimeDate": ISODate("2017-02-01T19:20:19Z"),
      "electionTime": Timestamp(1485976716, 1),
      "electionDate": ISODate("2017-02-01T19:18:36Z"),
      "configVersion": 2,
      "self": true
    }
  ],
  "ok": 1
}

The output is consistent both before and after restarts, as well as before and after inserting documents on the primary.

As mentioned in the documentation, using db.shutdownServer() to shut down the node before removing it is the supported procedure, and as you mentioned, does not exhibit this behavior.

Thanks,
Mark

Comment by Zhang Youdong [ 19/Jan/17 ]

Thanks for your reply!

There is no problem if shutdown the node frist, but if execute rs.remove() without shutdown the node removed, the removed node will continue to sync data from primary,so I want to know what's the expected behaivor by design? the removed node keeping sync data or the sync will stop?

Comment by Mark Agarunov [ 17/Jan/17 ]

Hello zyd_com,

Thank you for your report. To better assist you, I'd like to clarify a couple things. Is the secondary being shut down before removal with db.shutdownServer()? Do you still see this behavior if the node is removed by following the procedure detailed at https://docs.mongodb.com/manual/tutorial/remove-replica-set-member/ ? Additionally, please provide the exact steps and commands used to reproduce this behavior if possible.

Thanks,
Mark

Generated at Thu Feb 08 04:15:55 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.