[SERVER-2694] Replication Sets ending up with all secondaries... and no primary Created: 07/Mar/11  Updated: 30/Mar/12  Resolved: 07/Mar/11

Status: Closed
Project: Core Server
Component/s: Admin, Replication, Usability
Affects Version/s: 1.8.0-rc0, 1.8.0-rc1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Peter Colclough Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu 10 64 bit, 8gig memory... too much disk to worry about


Operating System: Linux
Participants:

 Description   

Firstly...a s a new user... brilliant package.... thanks. (And stupidly I posted this on the Ubuntu/mongo log as well... sorry... monday morning syndrome)

Now.. I have 6 instances in a replication set, spread over 2 physical machines. All works fine. If I then take down one of the machines, I end up with 3 instances, all being secondaries. This is a basic setup with default voting rights, and no arbiter.
The result of a rs.status() is below:

mycache:SECONDARY> rs.status()
{
"set" : "mycache",
"date" : ISODate("2011-03-04T15:49:01Z"),
"myState" : 2,
"members" : [
{
"_id" : 0,
"name" : "n.n.n.1:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 202,
"optime" :

{ "t" : 1299250255000, "i" : 1 }

,
"optimeDate" : ISODate("2011-03-04T14:50:55Z"),
"lastHeartbeat" : ISODate("2011-03-04T15:49:01Z")
},
{
"_id" : 1,
"name" : "n.n.n.2:27018",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"optime" :

{ "t" : 1299250255000, "i" : 1 }

,
"optimeDate" : ISODate("2011-03-04T14:50:55Z"),
"self" : true
},
{
"_id" : 2,
"name" : "n.n.n.3:27019",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 202,
"optime" :

{ "t" : 1299250255000, "i" : 1 }

,
"optimeDate" : ISODate("2011-03-04T14:50:55Z"),
"lastHeartbeat" : ISODate("2011-03-04T15:49:01Z")
},
{
"_id" : 3,
"name" : "n.n.1.1:27017",
"health" : 0,
"state" : 2,
"stateStr" : "(not reachable/healthy)",
"uptime" : 0,
"optime" :

{ "t" : 1299250255000, "i" : 1 }

,
"optimeDate" : ISODate("2011-03-04T14:50:55Z"),
"lastHeartbeat" : ISODate("2011-03-04T15:46:45Z"),
"errmsg" : "socket exception"
},
{
"_id" : 4,
"name" : "n.n.1.2:27018",
"health" : 0,
"state" : 1,
"stateStr" : "(not reachable/healthy)",
"uptime" : 0,
"optime" :

{ "t" : 1299250255000, "i" : 1 }

,
"optimeDate" : ISODate("2011-03-04T14:50:55Z"),
"lastHeartbeat" : ISODate("2011-03-04T15:46:45Z"),
"errmsg" : "socket exception"
},
{
"_id" : 5,
"name" : "n.n.1.3:27019",
"health" : 0,
"state" : 2,
"stateStr" : "(not reachable/healthy)",
"uptime" : 0,
"optime" :

{ "t" : 1299250255000, "i" : 1 }

,
"optimeDate" : ISODate("2011-03-04T14:50:55Z"),
"lastHeartbeat" : ISODate("2011-03-04T15:46:45Z"),
"errmsg" : "socket exception"
}
],
"ok" : 1
}

1. I tried reconfig, but that needs a primary, which I don't have.
2. Tried taking an instance down, freezing the other two, and bringing the third back up.... came back as a secondary.
3. Am going to try creating a new instance, and setting up as an arbiter, to see if that can help find a primary. However, this is not a long term solution. (see 4 below)
4. If I have more than one machine taking part in a replication set, in theory, for a resilient system, each machine would need to have an arbiter, in case another machine got taken out. With an even number of machines, that gives uis an even number of arbiters, which doesn't help if they are all in play (unless I am missing something obvious.... not for teh first time ).
If, however, we assign bitwise voting rights to each instance in a replicatuion set (1,2,4,8,16 .....), then any instance can be downed, or a whole machine can be downed, and a definite primary will also be voted in. This removes the need for an arbiter, and also gives the admins a chance to prioritise the servers taking part.... but I need a primary to change the config.

Thanks in advance for any help



 Comments   
Comment by Peter Colclough [ 11/Mar/11 ]

Thanks Andrew... and others. I had already read those sections. I realise we have a 'catch 22' here. I am off to play with some scenarios to see if we can 'automatically' recover, while emailing the sysadmins, and not killing the system while we are getting recovered.

Thanks for your help

Comment by Andrew Armstrong [ 11/Mar/11 ]

Try reading http://www.mongodb.org/display/DOCS/Reconfiguring+a+replica+set+when+members+are+down

You may consider running an arbiter node on a separate machine (eg a web server) so you have an odd number of servers.

The arbitrator as mentioned previously is very light weight, is not queried and has no data on it, all it does is cast a vote when as a decision maker when there are failures.

Comment by Peter Colclough [ 11/Mar/11 ]

Hi Eliot,

Thanks for the quick response. I kindda accept that, which is why I started with 3 nodes on a server..... which is a majority, unless they each vote for the next one down the line. I then doubled up teh servers to test the 'inter server operability' (I know... of COURSE it works ).

And I now understand... you need a majority for the total number of servers......not a majority from 'working' servers.....

Ok... so how do I add in a new instance on teh working server, to give me a majority, bearing in mind I can't change the config, as I don't have a primary...

Thanks for the help

Peter C

– Peter C –

Comment by Kristina Chodorow (Inactive) [ 08/Mar/11 ]

Thus, the recommended approach is to have an odd number of servers. See the "Rationale" section of http://www.mongodb.org/display/DOCS/Replica+Set+Design+Concepts.

The short answer is: the system is self-monitoring, it elects a primary when it safely can. You can't have what you're looking for (always have a primary) automatically without ending up with multiple masters and, thus, the possibility of conflicting writes.

It would be possible to allow more automatic reconfiguring of a set with no primary, but I don't think "most members unexpectedly and permanently go down" happens regularly for most people.

Comment by Peter Colclough [ 08/Mar/11 ]

Ok, that worked... thanks. I still think it would be useful if we could 'programmatically' force a server to be a primary. This would allow a system to self monitor, and if this situation occurred, to at least allow for a system monitor to sort something out. Its a catch 22, because of the reasons you gave (ie not wanting to have 2 sets of primaries on teh same set), but also allowing for a primary to be deduced when one system goes down, leaving no majority. An arbiter would only work if on a third machine, as if each machine has an arbiter (in case the other goes down), then normal processing will fail, because 2 arbiters would negate the need for them (if you see what I mean).

Conundrum....

Comment by Kristina Chodorow (Inactive) [ 07/Mar/11 ]

Shut down a server that could be primary (once your set is down to 3 servers) and restart it without the --replSet option and on a different port. Connect to it with the shell and modify the local.system.replset document to only have the 3 servers. Increment the version number and save the document back to the local.system.replset collection. Then restart the server on the correct port with --replSet and the other servers will pick up on the config change.

e.g., going from four servers to two servers:

$ mongo localhost:27021/local
MongoDB shell version: 1.9.0-pre-
connecting to: localhost:27021/local
> config = db.system.replset.findOne()
{
"_id" : "foo",
"version" : 4,
"members" : [

{ "_id" : 0, "host" : "ubuntu:27017" }

,

{ "_id" : 1, "host" : "ubuntu:27018" }

,

{ "_id" : 2, "host" : "ubuntu:27019" }

,

{ "_id" : 3, "host" : "ubuntu:27020" }

]
}
> config.members.pop()

{ "_id" : 3, "host" : "ubuntu:27020" }

> config.members.pop()

{ "_id" : 2, "host" : "ubuntu:27019" }

> config.version++
4
> db.system.replset.remove()
> db.system.replset.save(config)
> db.system.replset.find()
{ "_id" : "foo", "version" : 5, "members" : [

{ "_id" : 0, "host" : "ubuntu:27017" }

,

{ "_id" : 1, "host" : "ubuntu:27018" }

] }

See also: http://www.mongodb.org/display/DOCS/Reconfiguring+a+replica+set+when+members+are+down

Comment by Peter Colclough [ 07/Mar/11 ]

I see that issue now.. thanks. However, I am now in a situation where I have 3 nodes 'healthy', all of w3hom are secondaries, and it appears, no way of getting them to be a primary. I can't add an arbiter, as I need a primary to change the config through.

Is there a way I can 'force' a primary, even it it means using the UI to do this? btw, 'freezing' standing down etc, also doesn't achieve this, as I am always in a minority.

This is still a necessary function, as otherwise we would be in a 'mexican standoff' given the current scenario. I also don't see how voting changes/arbiters can actually help a scenario where a machine or two are taken out of service (or drop unexpectedly), leaving a minority behind... teh arbiter would have to be on a separate system, which always has access to all servers on that system......

Comment by Kristina Chodorow (Inactive) [ 07/Mar/11 ]

You can't elect a master based on the number of healthy nodes as then you could have a master on each side of a network partition. There is no way for a cluster of nodes to tell the difference between a network partition and nodes being down.

You need a majority of the total number of nodes to elect a master. That's why we suggest having an odd number of nodes/an arbiter/giving a node one extra vote.

Comment by Peter Colclough [ 07/Mar/11 ]

Eliot,

This still remains an issue (imvho). If you use a majority of the actual servers, including those that are unreachable, you may never be able to get a usable system. For example, if we had 7 servers, split 4 on one machine and 3 on another, if we take the '3' off... all is fine and dandy. If we take the '4' out, then we have a single server with 3, but all secondaries.

So the way around this is to have an arbiter. The arbiter would have to be on a third machine, so it isn't taken out if we down a server. Having an arbiter on one of teh main machines would simply cause an issue if that machine were taken out. If teh aribter were on a third machine, and that was taken out, we are back to square one again... if you see what I mean.

Surely the 'voting' should take place between 'reachable ' systems that are 'healthy'. That way you can always have a majority with the working systems.

Or am I really missing the point here?

Thanks in advance

Peter C

Comment by Eliot Horowitz (Inactive) [ 07/Mar/11 ]

Looks like you have 3 nodes up and 3 nodes down.
3/6 nodes is not a majority, so it won't elect a primary.
you should try and have an odd numbers

Generated at Thu Feb 08 03:00:55 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.