[SERVER-14885] replica sets that disable chaining may have trouble electing a primary if members have different priorities Created: 13/Aug/14  Updated: 06/Dec/22  Resolved: 26/Oct/15

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 3.1.9

Type: Bug Priority: Major - P3
Reporter: Zardosht Kasheff Assignee: Backlog - Replication Team
Resolution: Done Votes: 0
Labels: elections, mms-s
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-19118 Always allow replica to replicate fro... Closed
Assigned Teams:
Replication
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

This report comes mostly from code inspection. When chaining is not allowed in replication, ReplSetImpl::getMemberToSyncTo only allows a secondary to sync from the primary. If a primary cannot be reached, syncing does not happen.

In consensus.cpp, an election will refuse to elect a member with a lower priority if a member with a higher priority exists and is within 10 seconds of being caught up.

These two facts together can cause a replica set to never elect a primary.

Take the following scenario. Chaining is disabled, and no primary exists. Member A has priority 10 (the highest in the set), and is 5 seconds behind member B that has priority 1. B is furthest along. Neither A nor B will ever be elected. B won't be elected because the election algorithm will say "A is within 10 seconds and has a higher priority". A won't get elected because it is behind B, and because chaining is disallowed, cannot replicate from B to catch up.

I think the end result is a primary never gets elected.

I don't see any code that says "ignore the chainingAllowed bit and replicate off a secondary because a primary does not exist".



 Comments   
Comment by Eric Milkie [ 26/Oct/15 ]

In 3.1.9, the new election protocol no longer has this problem. Priorities are handled in a new way.

Generated at Thu Feb 08 03:36:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.