[SERVER-22502] Replication Protocol 1 rollbacks are more likely during priority takeover Created: 07/Feb/16  Updated: 21/Sep/16  Resolved: 21/Sep/16

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Yoni Douek Assignee: Eric Milkie
Resolution: Duplicate Votes: 0
Labels: RF
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: HTML File arb     HTML File crash     HTML File primary     HTML File secondary    
Issue Links:
Duplicate
duplicates SERVER-23663 New primary syncs from chosen node to... Closed
duplicates SERVER-18453 Avoiding Rollbacks in new Raft based ... Closed
Related
is related to SERVER-22504 Do not blindly add self to heartbeat ... Closed
is related to SERVER-11086 Election handoff to new primary, duri... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

General note: I know that the title is too general, but this is the 3rd bug I'm opening this week. We have another one coming for 3.2.1 related to sharding which we will soon publish. We are thinking of moving out of mongodb, the reliability of 3.2 is horrible!

2 bugs in this ticket:

1. We removed a member using rs.remove(). After that - the removed member (of which the log is attached) - started a versioning mess and killed itself.
filename = crash.

2. 2nd time we got the following behavior: a member selects itself, although it doesn't need to, and causes a rollback of the other member.
Our setup: primary, secondary and arbiter.
Primary: rs.stepDown() for maintenance.
Secondary takes over.
When primary is back, it starts syncing, as you can see from the logs - during this time it receives 2 "no" votes since it is still stale, but then - it receives only 1 "yes" vote (for some reason, the arbiter is quiet) - and is elected before its time. This causes a rollback on the other node.
All 3 nodes' logs are attached (primary, secondary, are). Please note the following lines:

2016-02-07T09:38:14.612+0000 I REPL     [ReplicationExecutor] VoteRequester: Got no vote from in.db2m2.mydomain.com:27017 because: candidate's data is staler than mine, resp:{ term: 3, voteGranted: false, reason: "candidate's data is staler than mine", ok: 1.0 }
2016-02-07T09:38:14.612+0000 I REPL     [ReplicationExecutor] VoteRequester: Got no vote from in.db2arb.mydomain.com:27017 because: candidate's data is staler than mine, resp:{ term: 3, voteGranted: false, reason: "candidate's data is staler than mine", ok: 1.0 }

and after 9 seconds, suddenly:

2016-02-07T09:38:25.613+0000 I REPL     [ReplicationExecutor] VoteRequester: Got no vote from in.db2m2.mydomain.com:27017 because: candidate's data is staler than mine, resp:{ term: 3, voteGranted: false, reason: "candidate's data is staler than mine", ok: 1.0 }
2016-02-07T09:38:25.613+0000 I REPL     [ReplicationExecutor] dry election run succeeded, running for election
2016-02-07T09:38:25.614+0000 I REPL     [ReplicationExecutor] election succeeded, assuming primary role in term 4
2016-02-07T09:38:25.614+0000 I REPL     [ReplicationExecutor] transition to PRIMARY

All members in protocol version 1. They were 0 but upgraded according to your docs ~a week ago.



 Comments   
Comment by Andy Schwerin [ 12/Apr/16 ]

Re-opening to mark as duplicate of feature request SERVER-18453.

Comment by Eric Milkie [ 18/Feb/16 ]

I am closing this out due to lack of activity, but it can be immediately reopened if there are further questions.
Issue summary:
Rollbacks are more likely during priority takeover elections when using protocol version 1 as compared with using protocol version 0. While this is unfortunate, the real solution to avoid reading and writing data that won't roll back is to use a write concern that guarantees that data will not roll back. Using a write concern that ignores the progress of secondaries will get you super fast performance at the expense of losing written data due to rollbacks from the result of changes in replica set leadership.

Comment by Eric Milkie [ 10/Feb/16 ]

If you're not using w:majority write concern for your writes, it is possible to lose such writes even when they appear to be successful, regardless of MongoDB version or replication protocol version.
If you want the priority takeover behavior that was present in 3.0, it is fine to run with protocol version 0 in version 3.2; it is not deprecated.

Comment by Yoni Douek [ 09/Feb/16 ]

Writes to majority is irrelevant in our case (primary, secondary and arbiter), and we don't want to impact our write speeds so that's not an option. Bottom line, we're getting rollbacks and in the previous versions we didn't. We had rollbacks of 7ms, but also of 11sec and even >50sec, are we bound to suffer from data loss just because we upgraded the protocolVersion?

Comment by Eric Milkie [ 08/Feb/16 ]

Hi Yoni.
We filed a separate ticket to fix the crash you experienced in the removed member: SERVER-22504. The cause is unrelated to code that was added for version 3.2 and has been present as a bug since version 3.0.

Regarding the behavior of a 3-node replica set with one arbiter, I do not see anything in your description that is unexpected for that configuration. There was one update issued to the lower priority primary at about the same moment that the higher priority secondary achieved a successful election for priority takeover; this update was rolled back after the election. This behavior can happen in both protocol versions 0 and 1. However, it may be more likely to happen in protocol version 1 due to the new way that priorities are enforced. At no time, however, are any committed writes rolled back. Writes that are written to a majority of nodes are considered committed and will never roll back; you can issue reads to a view of the data that only contains committed writes by using read concern level majority.
SERVER-11086 is scheduled to be implemented soon, which will avoid rollbacks when doing a controlled handoff of primaryship from one functioning node to another, as done during a replSetStepDown command or a priority takeover.

Comment by Yoni Douek [ 07/Feb/16 ]

Correction - "but then - it receives only 1 "no" * vote".

Generated at Thu Feb 08 04:00:35 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.