|
Cannot give the full output of rs.conf() prior to crash, as it's production cluster and I don't have a way to get it now, but here is what I did:
// on mongo1 (primary)
|
conf = rs.conf()
|
conf.members[2].votes = 0 // removed votes on mongo3 to have odd number of voting members, (but priority of this member still > 0)
|
conf.members[3].priority = 3 // mongo4: set to highest priority in cluster
|
conf.members[3].votes = 1 // mongo4: votes was 0 before that
|
conf.members[4].priority = 0.5 // mongo5: priority was 0 before that
|
conf.members[4].votes = 1 // mongo5: votes was 0 before that
|
conf.members[5].priority = 0.5 // mongo6: priority was 0 before that
|
conf.members[5].votes = 1 // mongo6: votes was 0 before that
|
rs.reconfig(conf)
|
From logs, this is what conf looked like:
{
|
"_id": "repl1",
|
"version": 106772,
|
"members": [
|
{
|
"_id": 23,
|
"host": "mongo1:27017",
|
"arbiterOnly": false,
|
"buildIndexes": true,
|
"hidden": false,
|
"priority": 0.8,
|
"tags": {},
|
"slaveDelay": 0,
|
"votes": 1
|
},
|
{
|
"_id": 24,
|
"host": "mongo2:27017",
|
"arbiterOnly": false,
|
"buildIndexes": true,
|
"hidden": false,
|
"priority": 0.4,
|
"tags": {},
|
"slaveDelay": 0,
|
"votes": 1
|
},
|
{
|
"_id": 25,
|
"host": "mongo3:27017",
|
"arbiterOnly": false,
|
"buildIndexes": true,
|
"hidden": false,
|
"priority": 0.3,
|
"tags": {},
|
"slaveDelay": 0,
|
"votes": 0
|
},
|
{
|
"_id": 26,
|
"host": "mongo4:27017",
|
"arbiterOnly": false,
|
"buildIndexes": true,
|
"hidden": false,
|
"priority": 3,
|
"tags": {},
|
"slaveDelay": 0,
|
"votes": 1
|
},
|
{
|
"_id": 27,
|
"host": "mongo5:27017",
|
"arbiterOnly": false,
|
"buildIndexes": true,
|
"hidden": false,
|
"priority": 0.6,
|
"tags": {},
|
"slaveDelay": 0,
|
"votes": 1
|
},
|
{
|
"_id": 28,
|
"host": "mongo6:27017",
|
"arbiterOnly": false,
|
"buildIndexes": true,
|
"hidden": false,
|
"priority": 0.5,
|
"tags": {},
|
"slaveDelay": 0,
|
"votes": 1
|
}
|
],
|
"settings": {
|
"chainingAllowed": true,
|
"heartbeatTimeoutSecs": 10,
|
"getLastErrorModes": {},
|
"getLastErrorDefaults": {
|
"w": 1,
|
"wtimeout": 0
|
}
|
}
|
}
|
Logs:
On 3.0 + MMAPv1 nodes:
mongo1:
2016-07-04T07:05:31.833+0000 I REPL [conn14464810] replSetReconfig admin command received from client
|
2016-07-04T07:05:31.840+0000 I REPL [conn14464810] replSetReconfig config object with 6 members parses ok
|
2016-07-04T07:05:31.842+0000 I REPL [ReplicationExecutor] New replica set config in use: { _id: "repl1", version: 106772, members: [ { _id: 23, host: "mongo1:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 0.8, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 24, host: "mongo2:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 0.4, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 25, host: "mongo3:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 0.3, tags: {}, slaveDelay: 0, votes: 0 }, { _id: 26, host: "mongo4:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 3.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 27, host: "mongo5:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 0.6, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 28, host: "mongo6:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 0.5, tags: {}, slaveDelay: 0, votes: 1 } ], settings: { chainingAllowed: true, heartbeatTimeoutSecs: 10, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 } } }
|
2016-07-04T07:05:31.842+0000 I REPL [ReplicationExecutor] This node is mongo1:27017 in the config
|
mongo2:
2016-07-04T07:05:33.321+0000 I NETWORK [ReplExecNetThread-14007] DBClientCursor::init call() failed
|
2016-07-04T07:05:33.321+0000 I NETWORK [ReplExecNetThread-14006] DBClientCursor::init call() failed
|
2016-07-04T07:05:33.321+0000 I REPL [ReplicationExecutor] Error in heartbeat request to mongo6:27017; Location10276 DBClientBase::findN: transport error: mongo6:27017 ns: admin.$cmd query: { replSetHeartbeat: "repl1", pv: 1, v: 106771, from: "mongo2:27017", fromId: 24, checkEmpty: false }
|
2016-07-04T07:05:33.321+0000 I REPL [ReplicationExecutor] Error in heartbeat request to mongo5:27017; Location10276 DBClientBase::findN: transport error: mongo5:27017 ns: admin.$cmd query: { replSetHeartbeat: "repl1", pv: 1, v: 106771, from: "mongo2:27017", fromId: 24, checkEmpty: false }
|
2016-07-04T07:05:33.335+0000 I NETWORK [ReplExecNetThread-14008] DBClientCursor::init call() failed
|
2016-07-04T07:05:33.335+0000 I REPL [ReplicationExecutor] Error in heartbeat request to mongo4:27017; Location10276 DBClientBase::findN: transport error: mongo4:27017 ns: admin.$cmd query: { replSetHeartbeat: "repl1", pv: 1, v: 106771, from: "mongo2:27017", fromId: 24, checkEmpty: false }
|
mongo3:
2016-07-04T07:05:31.836+0000 I REPL [SyncSourceFeedback] SyncSourceFeedback error sending update, response: { ok: 0.0, errmsg: "Received replSetUpdatePosition for node with memberId 24 whose config version of 106771 doesn't match our config version of 106772", code: 93 }
|
2016-07-04T07:05:31.838+0000 I NETWORK [conn12187888] end connection mongo6:57855 (621 connections now open)
|
2016-07-04T07:05:31.838+0000 I NETWORK [conn12187695] end connection mongo5:54784 (621 connections now open)
|
2016-07-04T07:05:31.838+0000 I REPL [ReplicationExecutor] could not find member to sync from
|
2016-07-04T07:05:31.913+0000 I NETWORK [conn12720076] end connection 127.0.0.1:45242 (619 connections now open)
|
2016-07-04T07:05:33.224+0000 I NETWORK [initandlisten] connection accepted from mongo2:47182 #12720086 (620 connections now open)
|
2016-07-04T07:05:33.224+0000 I NETWORK [conn12720086] end connection mongo2:47182 (619 connections now open)
|
2016-07-04T07:05:33.829+0000 I NETWORK [ReplExecNetThread-43360] DBClientCursor::init call() failed
|
2016-07-04T07:05:33.835+0000 I REPL [ReplicationExecutor] Error in heartbeat request to mongo4:27017; Location10276 DBClientBase::findN: transport error: mongo4:27017 ns: admin.$cmd query: { replSetHeartbeat: "repl1", pv: 1, v: 106771, from: "mongo3:27017", fromId: 25, checkEmpty: false }
|
On all 3.2 + WT nodes (mongo4, mongo5, mongo6; identical output):
2016-07-04T07:05:31.846+0000 W REPL [replExecDBWorker-0] Not persisting new configuration in heartbeat response to disk because it is invalid: BadValue: priority must be 0 when non-voting (votes:0)
|
2016-07-04T07:05:31.846+0000 E REPL [ReplicationExecutor] Could not validate configuration received from remote node; Removing self until an acceptable configuration arrives; BadValue: priority must be 0 when non-voting (votes:0)
|
2016-07-04T07:05:31.846+0000 I REPL [ReplicationExecutor] New replica set config in use: { _id: "repl1", version: 106772, members: [ { _id: 23, host: "mongo1:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 0.8, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 24, host: "mongo2:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 0.4, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 25, host: "mongo3:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 0.3, tags: {}, slaveDelay: 0, votes: 0 }, { _id: 26, host: "mongo4:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 3.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 27, host: "mongo5:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 0.6, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 28, host: "mongo6:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 0.5, tags: {}, slaveDelay: 0, votes: 1 } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 } } }
|
2016-07-04T07:05:31.846+0000 I REPL [ReplicationExecutor] This node is not a member of the config
|
2016-07-04T07:05:31.846+0000 I REPL [ReplicationExecutor] transition to REMOVED
|
Reason of the crash is obvious: mongo 3.0 allowed to set configuration which is invalid in mongo 3.2, so all mongo 3.2 nodes removed themselves, leaving a crippled cluster with 3 members of which only 2 had voting rights.
|