[SERVER-9464] On election, relax priority restriction when another member is fresher Created: 25/Apr/13  Updated: 10/Dec/14  Resolved: 23/Jul/13

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.4.3
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Aristarkh Zagorodnikov Assignee: Matt Dannenberg
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File loadgen.rb     File test.sh     File test1.js    
Issue Links:
Duplicate
duplicates SERVER-9934 Slow failovers after step down due to... Closed
Related
is related to SERVER-10621 Replication failover hangs indefinite... Closed
Participants:

 Description   

Consider having a replica set of 3 machines: dd-db1, dd-db2, dd-db3 with assigned priorities of 1.5, 1.0 and 0.7 correspondingly. dd-db1 is primary, other two are secondaries. Now, restart dd-db1 and dd-ddb3, the primary will shift (according to priorities) to dd-db2, which is correct. Now, restart dd-db2. It will also lose it's primary state, which is also correct. But now, if writes were coming at a steady pace, the oplog of dd-db2 would be several operations ahead of dd-db1 and dd-db3. This leads to replica set not getting a primary, since while dd-db2 is freshest and should become primary, the dd-db1 is up and has higher priority. I understand that it's a hard choice – either ignore priorities in favor of freshness or ignore freshness (and possibly cause rollbacks leading to a likely data loss) and favor priorities. I still think both of these solutions are better than leaving a replica set in the infinite "no primary" state. By the way, temporarily shutting down the higher-priority server helps, the freshest server becomes primary and the restarted higher-priority server just catches up and becomes primary again after a new election.

P.S. We've seen this with 2.2 also, moved to 2.4 but it appears to still ocur.



 Comments   
Comment by Aristarkh Zagorodnikov [ 26/Aug/13 ]

Moved all my late comments to SERVER-10621.

Comment by Matt Dannenberg [ 23/Jul/13 ]

This is fixed by SERVER-9934. I've marked SERVER-9934 for backport as a result. Please follow that ticket for updates.

Comment by Aristarkh Zagorodnikov [ 23/Jul/13 ]

I honestly doubt that the OS choice is important here, I believe Arch would do fine (CentOS 6.4 behaved in the same way for example). Yes, stopping writes solves the problem, but it appears that it is at least a part of the reason behind the problem itself. db0 might become primary, this is what I meant with "repro works in about 85%..." – the remaining 15% represent the case when everything works normally. As for hanging at "db1 to become primary", I'm not sure if I had this problem (will do some testing), but you can just kill the script if it hangs and inspect the rs.status() of remaining servers (db1 & db2) for any hints to the reason behind this specific issue.

Comment by Matt Dannenberg [ 23/Jul/13 ]

I spent a bit more time on this friday, but could not find the root cause of your problem. However, I did find that if write operations were halted before the primary came back online, the situation seemed to resolve as expected (db0 becomes primary). Also, by random chance your script seems to occasionally result in the db0 becoming primary. Lastly, I found the repro script would hang indefinitely at 'Waiting for db1 to become primary' fairly frequently for me. Does it work more consistently for you? I am running arch rather than ubuntu and wonder if it might be setting up ubuntu to work on this.

Comment by Aristarkh Zagorodnikov [ 15/Jul/13 ]

No problem, it's great that the repro worked.

Comment by Matt Dannenberg [ 15/Jul/13 ]

Sorry for the lack of response, I was on vacation.

I was able to run your repro and it worked as described. We are pushing out a release this week, but hopefully I can spend more time on this after that.

Comment by Aristarkh Zagorodnikov [ 08/Jul/13 ]

Hi!
Were you able to repro the issue in your environment?

Comment by Aristarkh Zagorodnikov [ 01/Jul/13 ]

Again I ask you to excuse me for the long silence, had a lot on my hands after a trip to MongoNYC.
I created a small repro tool. It requires a modern Linux (I used Ubuntu 12.04.2 LTS), mongod and mongo binaries in path, ruby (1.8.7+) and a 10gen ruby driver (I used 1.9.0).
I did tests versus 2.4.4 version of MongoDB. The repro works in about 85% of the times. After installing the aforementioned prerequisites, just copy loadgen.rb and test.sh to any directory that is writable by the current user, and execute the test.sh script. Its progress is pretty straightforward, you can examine the script itself.

Comment by Matt Dannenberg [ 26/Jun/13 ]

I wrote a test that I believe checks what Scott proposed as the issue and it appears to show that the conclusions reached above are not correct. I've attached the test so that you can see what it does and confirm my beliefs.

Let me know if you have a step-by-step (or better still automated) repro.

Comment by Aristarkh Zagorodnikov [ 25/Jun/13 ]

Yes, this should solve this case.

Comment by Scott Hernandez (Inactive) [ 24/Jun/13 ]

I think the problem is in this case where no chaining is allowed where we really want it to mean prefer primary, not only primary, as a replication source, so that the high priority node can become freshest and be elected. I believe what we want in this situation is the higher priority node to replicate and become the freshest from the non-primary replica (with lower priority).

Comment by Aristarkh Zagorodnikov [ 22/Jun/13 ]

After I talked today with Scott Hernandez, I remembered that I should note that we have chained replication disabled, so the secondary with higher priority can't catch up from the more fresh one.

Comment by Aristarkh Zagorodnikov [ 07/Jun/13 ]

Matt, sorry for the long delay, I'm currently very busy. Will do something about repro steps in the coming days.

Comment by Matt Dannenberg [ 04/Jun/13 ]

I am having trouble reproducing what you've described above. Can you give me the step by step commands used to produce this behavior?

Comment by Aristarkh Zagorodnikov [ 08/May/13 ]

Good to hear it's going to be improved, thanks.

Comment by Eric Milkie [ 07/May/13 ]

Rollbacks are very undesirable. The documentation says that priorities help guide which member gets elected but they are not absolute.
In the situation outlined in the description above, we should elect the lower priority, freshest member. The primaryship will subsequently move to the higher priority node after it has caught up.

Generated at Thu Feb 08 03:20:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.