[SERVER-35835] Allow quicker sync source change when a new Primary is elected Created: 27/Jun/18  Updated: 26/Jul/18  Resolved: 06/Jul/18

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 3.4.14
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Dmitry Ryabtsev Assignee: Tess Avitabile (Inactive)
Resolution: Duplicate Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-35200 Speed up failure detection in the Opl... Closed
Related
is related to SERVER-35996 Create performance tests for measurin... Closed
is related to SERVER-35200 Speed up failure detection in the Opl... Closed
Sprint: Repl 2018-07-30
Participants:
Case:

 Description   

It has been observed that with chained replication disabled when the current primary becomes unresponsive and the secondaries elect a new primary, they keep syncing to the original primary for a notable amount of time instead of switching to the new one as soon as it is transitioned into PRIMARY. It causes the following issues:

  • The new primary will fail to acknowledge w:2+ writes since there are no secondaries syncing from it, effectively making the outage longer
  • If the original primary gets unblocked, there is likely to be a rollback not only on that primary but also on the secondaries.
  • The rollback can happen on a majority of the replica set members

I would be better if the secondaries could re-evaluate their sync source immediately after the new primary becomes available for writes.



 Comments   
Comment by Spencer Brody (Inactive) [ 28/Jun/18 ]

An acknowledged write could only roll back if the write concern was less than w:majority, regardless of wtimeout. A write that was successfully acknowledged with w:majority write concern must never roll back.

Generated at Thu Feb 08 04:41:10 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.