[SERVER-1987] Assumption of master: arbiter should vote for most ahead replica member Created: 22/Oct/10 Updated: 12/Jul/16 Resolved: 27/Oct/10 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 1.7.1 |
| Fix Version/s: | 1.7.2 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Mark Kirkwood | Assignee: | Kristina Chodorow (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Ubuntu lucid x86-64 mongodb compiled from src |
||
| Attachments: |
|
| Operating System: | ALL |
| Participants: |
| Description |
|
In the case of a 2 member + 1 arbiter replica set: e.g: > rs.isMaster() Consider the following: 1/ kill the member on port 27017, so 27018 becomes the master: I (possibly wrongly) was under the impression that optime was a factor in deciding who was to be primary, but clearly in this case the arbiter is voting for the most behind member to be primary when it comes up. Log information from member on port 27018 - notes other member down and becomes primary, also see mention that changes are fsync'ed: Fri Oct 22 15:05:53 [rs_sync] replSet syncThread: 10278 dbclient error communicating with server Fri Oct 22 15:09:23 [conn5] CMD fsync: sync:1 lock:0 ... now detects member on port 27017 is up, which is then chosen as primary, so we roll back our previous work: Fri Oct 22 15:11:51 [ReplSetHealthPollTask] replSet info 192.168.2.63:27017 is now up |
| Comments |
| Comment by auto [ 20/Nov/10 ] |
|
Author: {'login': 'kchodorow', 'name': 'Kristina Chodorow', 'email': 'kristina@10gen.com'}Message: check for better primary before voting yea |
| Comment by Mark Kirkwood [ 28/Oct/10 ] |
|
Hey, my pleasure - was fun. |
| Comment by Kristina Chodorow (Inactive) [ 27/Oct/10 ] |
|
Great! Thanks for all the help debugging this. |
| Comment by Mark Kirkwood [ 27/Oct/10 ] |
|
Applied patch to my 1.7.1 src and rebuilt - looks fixed to me: several tries and we elect the right/best/freshest member to be primary each time. Fast fix! Tested with latest code from git, looks fixed there too. Nice! |
| Comment by Kristina Chodorow (Inactive) [ 27/Oct/10 ] |
|
Could you try the latest code (or the 1.7 nightly build tomorrow)? |
| Comment by auto [ 27/Oct/10 ] |
|
Author: {'login': 'kchodorow', 'name': 'Kristina Chodorow', 'email': 'kristina@10gen.com'}Message: check for better primary before voting yea |
| Comment by Kristina Chodorow (Inactive) [ 27/Oct/10 ] |
|
Thanks! Looks like a race condition (27017 tries to elect itself before it knows about 27018 and the arbiter goes along with it), fixing... |
| Comment by Mark Kirkwood [ 26/Oct/10 ] |
|
Poking about in the code a little and enabling some debugging output for Consensus::weAreFreshest shows that when 27017 comes up it thinks it is freshest (from 27017 log): 11:13:34 [ReplSetHealthPollTask] replSet info localhost:27019 is now up Getting into this a bit more, with some extra debugging code: 14:41:22 [rs Manager] replSet freshest returns { opTime: new Date(0), fresher: false, ok: 1.0 }14:41:22 [rs Manager] replSet dev we are freshest of up nodes, nok:1 nTies:0 So looks to me like the optime for the ARBITER (27019) is being returned here... I'm trying to figure out why that is (note: stumbling through the code a bit so I may have misunderstood what is happening here) |
| Comment by Mark Kirkwood [ 25/Oct/10 ] |
|
Hmm - weird, I tried on my home workstation (32 bit Ubuntu) and can reproduce consistently. Will attach logs. |
| Comment by Kristina Chodorow (Inactive) [ 25/Oct/10 ] |
|
Hmm, still can't reproduce with 1.7.1. Can you attach or paste the arbiter's and 27017's logs? |
| Comment by Kristina Chodorow (Inactive) [ 25/Oct/10 ] |
|
Thanks for the detailed instructions! I am unable to reproduce on 1.7.2-, I'm compiling 1.7.1 and will let you know. Hopefully it's been incidentally fixed. |
| Comment by Mark Kirkwood [ 25/Oct/10 ] |
|
Also: If it would be helpful I'll attach a complete 'how to produce' with steps I used to setup the replicas plus the data import and update commands, Cheers! (edit) Since I have the file(s) I'll attach 'em anyway! |
| Comment by Mark Kirkwood [ 25/Oct/10 ] |
|
Good point, I went and retested to check. Both 27018 and 27017 syncd (web console shows both at 4cc100be:1 ). I killed 27017 again, and updated some docs in a collection while connected to 27018 - which then advanced to 4cc4f4c5:3e7. I restarted 27017 and the web console now shows: 27017 - PRIMARY 4cc100be:1 Here is my rs.conf at that point: > rs.conf() , , { "_id" : 2, "host" : "192.168.2.63:27019", "arbiterOnly" : true } ] |
| Comment by Kristina Chodorow (Inactive) [ 22/Oct/10 ] |
|
Sorry, rs.conf(). |
| Comment by Kristina Chodorow (Inactive) [ 22/Oct/10 ] |
|
It should use optime. Can you send rs.config()? Is 27018 fully synced when 27017 goes down? (You can check the web console, make sure the values in the optime column are the same.) Fsyncs don't matter. |
| Comment by Mark Kirkwood [ 22/Oct/10 ] |
|
Correction: I mention that the log shows when I did the fsync - not entirely true, it actually shows that I forgot to change to the 'admin' db before doing the fsync! I did that immediately afterwards... honest... |