|
It looks like the new node 9019 rejects the new config:
| 2020-05-05T13:44:59.864-04:00 I REPL 21400 [conn21] "Scheduling heartbeat to fetch a new config since we are not a member of our current config","attr":{"senderHost":"louisamac:9004"}
|
| 2020-05-05T13:44:59.865-04:00 I CONNPOOL 22576 [ReplNetwork] "Connecting to {hostAndPort}","attr":{"hostAndPort":"louisamac:9004"}
|
| 2020-05-05T13:44:59.911-04:00 W REPL 21487 [ReplCoord-0] "Not persisting new configuration in heartbeat response to disk because it is invalid","attr":{"error":{"code":2,"codeName":"BadValue","errmsg":"Nodes being used for config servers must be started with the --configsvr flag"}}
|
| 2020-05-05T13:44:59.911-04:00 E REPL 21490 [ReplCoord-0] "Could not validate configuration received from remote node; Removing self until an acceptable configuration arrives","attr":{"error":{"code":2,"codeName":"BadValue","errmsg":"Nodes being used for config servers must be started with the --configsvr flag"}}
|
| 2020-05-05T13:44:59.911-04:00 I REPL 21392 [ReplCoord-0] "New replica set config in use","attr":{"config":{"_id":"csrs","version":2,"term":1,"configsvr":true,"protocolVersion":1,"writeConcernMajorityJournalDefault":true,"members":[{"_id":0,"host":"louisamac:9004","arbiterOnly":false,"buildIndexes":true,"hidden":false,"priority":1.0,"tags":{},"slaveDelay":0,"votes":1},{"_id":1,"host":"louisamac:9019","arbiterOnly":false,"buildIndexes":true,"hidden":false,"priority":1.0,"tags":{},"slaveDelay":0,"votes":1}],"settings":{"chainingAllowed":true,"heartbeatIntervalMillis":2000,"heartbeatTimeoutSecs":10,"electionTimeoutMillis":10000,"catchUpTimeoutMillis":-1,"catchUpTakeoverDelayMillis":30000,"getLastErrorModes":{},"getLastErrorDefaults":{"w":1,"wtimeout":0},"replicaSetId":{"$oid":"5eb1a5c5950801a13ee31207"}}}}
|
| 2020-05-05T13:44:59.924-04:00 I REPL 21394 [ReplCoord-0] "This node is not a member of the config"
|
| 2020-05-05T13:44:59.937-04:00 I REPL 21358 [ReplCoord-0] "Replica set state transition","attr":{"newState":"REMOVED","oldState":"STARTUP"}
|
Since node 9019 is removed, I think the original primary 9004 will eventually step down (after 10 seconds, which is the default election timeout). The reconfig command is interrupted because we kill the operation on step down:
| 2020-05-05T13:45:09.913-04:00 I REPL 21809 [ReplCoord-2] "Can't see a majority of the set, relinquishing primary"
|
| 2020-05-05T13:45:09.922-04:00 I REPL 21475 [ReplCoord-2] "Stepping down from primary in response to heartbeat"
|
| 2020-05-05T13:45:09.922-04:00 I REPL 21343 [RstlKillOpThread] "Starting to kill user operations"
|
| 2020-05-05T13:45:09.922-04:00 I REPL 21344 [RstlKillOpThread] "Stopped killing user operations"
|
| 2020-05-05T13:45:09.922-04:00 I REPL 21340 [RstlKillOpThread] "State transition ops metrics","attr":{"metrics":{"lastStateTransition":"stepDown","userOpsKilled":2,"userOpsRunning":0}}
|
...
|
| 2020-05-05T13:45:09.923-04:00 I COMMAND 51803 [conn104] "Slow query","attr":{"type":"command","ns":"local.system.replset","appName":"MongoDB Automation Agent v10.15.0 (git: DEV)","command":{"replSetReconfig":{"_id":"csrs","version":2,"members":[{"slaveDelay":0,"votes":1,"arbiterOnly":false,"hidden":false,"tags":{},"buildIndexes":true,"host":"louisamac:9004","priority":1.0,"_id":0},{"hidden":false,"priority":1.0,"tags":{},"votes":1,"host":"louisamac:9019","buildIndexes":true,"arbiterOnly":false,"slaveDelay":0,"_id":1}],"settings":{"chainingAllowed":true,"heartbeatTimeoutSecs":10,"catchUpTimeoutMillis":-1,"getLastErrorDefaults":{"w":1,"wtimeout":0},"getLastErrorModes":{},"electionTimeoutMillis":10000,"replicaSetId":{"$oid":"5eb1a5c5950801a13ee31207"},"heartbeatIntervalMillis":2000,"catchUpTakeoverDelayMillis":30000},"configsvr":true,"protocolVersion":1},"maxTimeMS":40000,"lsid":{"id":{"$uuid":"da7e2d9a-165d-4383-8033-10bc0b1b480e"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1588700699,"i":1}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"admin","$readPreference":{"mode":"primaryPreferred"}},"numYields":0,"ok":0,"errMsg":"Reconfig finished but failed to propagate to a majority :: caused by :: Current config with {version: 2, term: 1} has not yet propagated to a majority of nodes :: caused by :: operation was interrupted","errName":11602,"errCode":11602,"reslen":604,"locks":{"ParallelBatchWriterMode":{"acquireCount":{"r":5}},"ReplicationStateTransition":{"acquireCount":{"w":5}},"Global":{"acquireCount":{"r":1,"w":4}},"Database":{"acquireCount":{"r":1,"w":2,"W":1}},"Collection":{"acquireCount":{"w":2}},"Mutex":{"acquireCount":{"r":3}},"oplog":{"acquireCount":{"r":1}}},"flowControl":{"acquireCount":2,"timeAcquiringMicros":3},"protocol":"op_msg","durationMillis":10097}
|
|
|
It looks like the primary (9004) is able to persist the new config to disk and is waiting for a majority to replicate the config:
| 2020-05-05T13:44:59.906-04:00 I REPL 21392 [conn104] "New replica set config in use","attr":{"config":{"_id":"csrs","version":2,"term":1,"configsvr":true,"protocolVersion":1,"writeConcernMajorityJournalDefault":true,"members":[{"_id":0,"host":"louisamac:9004","arbiterOnly":false,"buildIndexes":true,"hidden":false,"priority":1.0,"tags":{},"slaveDelay":0,"votes":1},{"_id":1,"host":"louisamac:9019","arbiterOnly":false,"buildIndexes":true,"hidden":false,"priority":1.0,"tags":{},"slaveDelay":0,"votes":1}],"settings":{"chainingAllowed":true,"heartbeatIntervalMillis":2000,"heartbeatTimeoutSecs":10,"electionTimeoutMillis":10000,"catchUpTimeoutMillis":-1,"catchUpTakeoverDelayMillis":30000,"getLastErrorModes":{},"getLastErrorDefaults":{"w":1,"wtimeout":0},"replicaSetId":{"$oid":"5eb1a5c5950801a13ee31207"}}}}
|
| 2020-05-05T13:44:59.909-04:00 I REPL 21393 [conn104] "Found self in config","attr":{"hostAndPort":"louisamac:9004"}
|
| 2020-05-05T13:44:59.910-04:00 I REPL 4508702 [conn104] "Waiting for the current config to propagate to a majority of nodes."
|
A bit later, we see failed heartbeats to the new node we added (9019):
| 2020-05-05T13:45:01.922-04:00 I REPL_HB 23974 [ReplCoord-2] "Heartbeat failed after max retries","attr":{"target":"louisamac:9019","maxHeartbeatRetries":2,"error":{"code":93,"codeName":"InvalidReplicaSetConfig","errmsg":"Our replica set configuration is invalid or does not include us"}}
|
|