[SERVER-30066] Alert users when adding a downed node to a replica set Created: 10/Jul/17  Updated: 27/Oct/23  Resolved: 01/Sep/17

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 3.0.0
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Benjamin Appréderisse Assignee: Spencer Brody (Inactive)
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:

 Description   

In MongoDB 2.6, we alerted users if they added a replica set member when it was down. To aid in automation, we should consider restoring similar functionality:

replset:PRIMARY> rs.add('1.2.3.7:27017')
{ "down" : [ "1.2.3.6:27017", "1.2.3.7:27017" ], "ok" : 1 }

Original Summary

rs.addArb does not return an error when host is unavailable

Original Description

I am using MongoDB Enterprise Version 3.4.6.

rs.addArb(host)

return

MongoDB shell version v3.4.6
connecting to: mongodb://127.0.0.1:27017/admin
MongoDB server version: 3.4.6
{ "ok" : 1 }

when host is not available.

It should have the same behaviour as rs.add(host).

=> Difficult to catch the error during automation.



 Comments   
Comment by Spencer Brody (Inactive) [ 01/Sep/17 ]

The way this worked in 2.6 was that the node would send a message to every other node in the set as part of processing the reconfig and would report the other node as down if it didn't get a response. This behavior was removed as we don't want reconfigs to be performing blocking network i/o as part of their operation - it slows down and increases the complexity of the reconfig, plus its notion of whether it will consider the other nodes down is quite brittle as it's dependent only on missing this single message. The best way to know whether one node considers the other nodes in its set to be down is to check the results of running replSetGetStatus. ReplSetGetStatus will only report another node as down if the node it's run against hasn't heard any replication messages from the other node within the election timeout, which is a more robust way to determine whether a node is down and is based on the same state tracking that is used by the rest of the replication system.

Comment by Kelsey Schubert [ 19/Jul/17 ]

Hi bappr,

This behavior appears to be consistent between rs.addArb() and rs.add():

$ mongo --eval "rs.addArb('1.2.3.4:27017')"
MongoDB shell version v3.4.6
connecting to: mongodb://127.0.0.1:27017
MongoDB server version: 3.4.6
{ "ok" : 1 }
 $ mongo --eval "rs.add('1.2.3.5:27017')"
MongoDB shell version v3.4.6
connecting to: mongodb://127.0.0.1:27017
MongoDB server version: 3.4.6
{ "ok" : 1 }

I see that a while back, in MongoDB 2.6, we provided additional information:

$ mongo
MongoDB shell version: 2.6.12
connecting to: test
replset:PRIMARY> rs.addArb('1.2.3.6:27017')
{ "down" : [ "1.2.3.6:27017" ], "ok" : 1 }
replset:PRIMARY> rs.add('1.2.3.7:27017')
{ "down" : [ "1.2.3.6:27017", "1.2.3.7:27017" ], "ok" : 1 }

Therefore, I'm repurposing this ticket as an improvement request to restore this functionality and sending it to the replication team to consider. Please let me know if this does not address your concerns.

Kind regards,
Thomas

Generated at Thu Feb 08 04:22:34 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.