[SERVER-40182] All replica set members are secondaries Created: 17/Mar/19 Updated: 19/Mar/19 Resolved: 19/Mar/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Pavel Zeger [X] | Assignee: | Eric Sedor |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Operating System: | ALL |
| Participants: |
| Description |
|
Hi. We have 7 servers in prod in two networks (uk and nl) where one of these servers is Arbiter and another one is Hidden. We encountered a problem with elections despite the fact all servers were online (without network issues): all servers were in the state SECONDARY and we didn't see any error in logs. I attached a configurations. I suppose we had a majority in such situation What can be a problem?
[ { "_id" : 3, "name" : "uk1:27017", "health" : 1, "state" : 1, "stateStr" : "PRIMARY", "uptime" : 74206, "optime" : Timestamp(1552820007, 13), "optimeDate" : ISODate("2019-03-17T10:53:27Z"), "lastHeartbeat" : ISODate("2019-03-17T10:53:27.621Z"), "lastHeartbeatRecv" : ISODate("2019-03-17T10:53:27.664Z"), "pingMs" : NumberLong(13), "electionTime" : Timestamp(1552738992, 1), "electionDate" : ISODate("2019-03-16T12:23:12Z"), "configVersion" : 498059 }, { "_id" : 4, "name" : "uk2:27017", "health" : 1, "state" : 2, "stateStr" : "SECONDARY", "uptime" : 74206, "optime" : Timestamp(1552820007, 13), "optimeDate" : ISODate("2019-03-17T10:53:27Z"), "lastHeartbeat" : ISODate("2019-03-17T10:53:27.621Z"), "lastHeartbeatRecv" : ISODate("2019-03-17T10:53:27.278Z"), "pingMs" : NumberLong(13), "syncingTo" : "uk1:27017", "configVersion" : 498059 }, { "_id" : 5, "name" : "nl1:27017", "health" : 1, "state" : 2, "stateStr" : "SECONDARY", "uptime" : 74206, "optime" : Timestamp(1552820007, 13), "optimeDate" : ISODate("2019-03-17T10:53:27Z"), "lastHeartbeat" : ISODate("2019-03-17T10:53:27.596Z"), "lastHeartbeatRecv" : ISODate("2019-03-17T10:53:27.398Z"), "pingMs" : NumberLong(7), "syncingTo" : "uk1:27017", "configVersion" : 498059 }, { "_id" : 6, "name" : "nl2:27017", "health" : 1, "state" : 2, "stateStr" : "SECONDARY", "uptime" : 74206, "optime" : Timestamp(1552820007, 13), "optimeDate" : ISODate("2019-03-17T10:53:27Z"), "lastHeartbeat" : ISODate("2019-03-17T10:53:27.596Z"), "lastHeartbeatRecv" : ISODate("2019-03-17T10:53:25.885Z"), "pingMs" : NumberLong(7), "syncingTo" : "uk1:27017", "configVersion" : 498059 }, { "_id" : 7, "name" : "az1:27017", "health" : 1, "state" : 7, "stateStr" : "ARBITER", "uptime" : 10951400, "syncingTo" : "nl1:27017", "infoMessage" : "syncing from: nl1wv8912:27017", "configVersion" : 498059, "self" : true }, { "_id" : 8, "name" : "uk2:27017", "health" : 1, "state" : 2, "stateStr" : "SECONDARY", "uptime" : 74206, "optime" : Timestamp(1552820007, 13), "optimeDate" : ISODate("2019-03-17T10:53:27Z"), "lastHeartbeat" : ISODate("2019-03-17T10:53:27.622Z"), "lastHeartbeatRecv" : ISODate("2019-03-17T10:53:27.623Z"), "pingMs" : NumberLong(14), "syncingTo" : "uk2:27017", "configVersion" : 498059 }, { "_id" : 9, "name" : "nl3:27017", "health" : 1, "state" : 2, "stateStr" : "SECONDARY", "uptime" : 74206, "optime" : Timestamp(1552820007, 13), "optimeDate" : ISODate("2019-03-17T10:53:27Z"), "lastHeartbeat" : ISODate("2019-03-17T10:53:27.596Z"), "lastHeartbeatRecv" : ISODate("2019-03-17T10:53:26.222Z"), "pingMs" : NumberLong(7), "syncingTo" : "nl2:27017", "configVersion" : 498059 }] { }, }, }, }, }, }, }, |
| Comments |
| Comment by Eric Sedor [ 19/Mar/19 ] |
|
Thanks. Upgrading to Replication Protocol version 1 (pv1) should help resolve this issue. After that, we recommend upgrading to MongoDB 3.4+. MongoDB 3.2 reached end of life in September of 2018. |
| Comment by Pavel Zeger [X] [ 19/Mar/19 ] |
|
3.2 |
| Comment by Eric Sedor [ 18/Mar/19 ] |
|
Hello; can you please clarify the MongoDB version you're running? |
| Comment by Pavel Zeger [X] [ 17/Mar/19 ] |
|
Also I found on the nl1 host these lines: 2019-03-16T09:34:27.010+0000 I REPL [ReplicationExecutor] uk2:27017 is trying to elect itself but uk1:27017 is already primary |