Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 3.2.7
Component/s: Replication
Labels:
None

Operating System:
ALL
Steps To Reproduce:

Hide

1. have a replica set of 3.0.2 + MMAPv1 and 3.2.7 + WT nodes, where 3.0.2 node is primary.
2. set one 3.0.2 node priority > 0 and votes = 0, and make one 3.2.7 node high priority so it would become master upon reconfiguration
3. reconfigure the replicaset on 3.0.2 primary node
4. all 3.2.7 nodes should crash

Show
1. have a replica set of 3.0.2 + MMAPv1 and 3.2.7 + WT nodes, where 3.0.2 node is primary. 2. set one 3.0.2 node priority > 0 and votes = 0, and make one 3.2.7 node high priority so it would become master upon reconfiguration 3. reconfigure the replicaset on 3.0.2 primary node 4. all 3.2.7 nodes should crash
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

We had a replicaset with this configuration (pseudocode):

{
mongo1: { priority: 3, votes: 1, role: PRIMARY, mongo: 3.0.2, storage: MMAPv1 },
mongo2: { priority: 0.5, votes: 1, role: SECONDARY, mongo: 3.0.2, storage: MMAPv1 },
mongo3: { priority: 0.5, votes: 1, role: SECONDARY, mongo: 3.0.2, storage: MMAPV1 },
mongo4: { priority: 0, votes: 1, role: SECONDARY, mongo: 3.2.7, storage: WiredTiger },
mongo5: { priority: 0, votes: 1, role: SECONDARY, mongo: 3.2.7, storage: WiredTiger },
mongo6: { priority: 0, votes: 0, role: SECONDARY, hidden: true, mongo: 3.2.7, storage: WiredTiger }
}

The plan was to switch primary to mongo4 (3.2.7 + WT), so we could upgrade mongo

{1,2,3}, but I managed to crash mongo{4,5,6} with this, on mongo1:

Unable to find source-code formatter for language: javascript. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml

conf = {
   members: [
      { host: mongo1..., priority: 3, votes 1},
      { host: mongo2..., priority: 0.5, votes 1},
      { host: mongo3..., priority: 0.5, votes 0},  // <-- culprit?
      { host: mongo4..., priority: 5, votes 1},
      { host: mongo5..., priority: 0.8, votes 1},
      { host: mongo6..., priority: 0.8, votes 1} // was hidden before
};
rs.reconfig(conf);

In effect, on mongo{4,5,6} mongod crashed, leaving mongo{1,2,3}

in SECONDARY state without being able to elect a PRIMARY.

In logs we found this:

2016-07-04T07:05:31.842+0000 W REPL     [replExecDBWorker-1] Not persisting new configuration in heartbeat response to disk because it is invalid: BadValue: priority must be 0 when non-voting (votes:0)
2016-07-04T07:05:31.842+0000 E REPL     [ReplicationExecutor] Could not validate configuration received from remote node; Removing self until an acceptable configuration arrives; BadValue: priority must be 0 when non-voting (votes:0)
2016-07-04T07:05:31.842+0000 I REPL     [ReplicationExecutor] New replica set config in use: { _id: "repl1", version: 106772, members: [ { _id: 23, host: "mongo1:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 0.8, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 24, host: "mongo2:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 0.4, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 25, host: "mongo3:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 0.3, tags: {}, slaveDelay: 0, votes: 0 }, { _id: 26, host: "mongo4:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 3.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 27, host: "mongo5:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 0.6, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 28, host: "mongo6:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 0.5, tags: {}, slaveDelay: 0, votes: 1 } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 } } }

And then:

2016-07-04T07:05:31.865+0000 I -        [ReplicationExecutor] Invariant failure i < _members.size() src/mongo/db/repl/replica_set_config.cpp 560

I'm sure it was a mistake on my end to set priority > 0 and votes = 0 on mongo3, but the way 3.2 + WT nodes reacted was certainly not nice.

Also, please suggest how to perform this switch and upgrade in least dangerous fashion.

Assignee:: Kelsey Schubert
Reporter:: Tomas Varaneckas
Participants:: Kelsey Schubert, Ramon Fernandez Marina, Tomas Varaneckas
Votes:: 0 Vote for this issue
Watchers:: 9 Start watching this issue

Created:: Jul 04 2016 08:21:32 AM UTC
Updated:: Aug 24 2016 05:12:55 PM UTC
Resolved:: Jul 28 2016 06:59:10 PM UTC

Details

Description

Attachments

Activity

People

Dates