[SERVER-14885] replica sets that disable chaining may have trouble electing a primary if members have different priorities Created: 13/Aug/14 Updated: 06/Dec/22 Resolved: 26/Oct/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | 3.1.9 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Zardosht Kasheff | Assignee: | Backlog - Replication Team |
| Resolution: | Done | Votes: | 0 |
| Labels: | elections, mms-s | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Replication
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Participants: | |||||||||
| Description |
|
This report comes mostly from code inspection. When chaining is not allowed in replication, ReplSetImpl::getMemberToSyncTo only allows a secondary to sync from the primary. If a primary cannot be reached, syncing does not happen. In consensus.cpp, an election will refuse to elect a member with a lower priority if a member with a higher priority exists and is within 10 seconds of being caught up. These two facts together can cause a replica set to never elect a primary. Take the following scenario. Chaining is disabled, and no primary exists. Member A has priority 10 (the highest in the set), and is 5 seconds behind member B that has priority 1. B is furthest along. Neither A nor B will ever be elected. B won't be elected because the election algorithm will say "A is within 10 seconds and has a higher priority". A won't get elected because it is behind B, and because chaining is disallowed, cannot replicate from B to catch up. I think the end result is a primary never gets elected. I don't see any code that says "ignore the chainingAllowed bit and replicate off a secondary because a primary does not exist". |
| Comments |
| Comment by Eric Milkie [ 26/Oct/15 ] |
|
In 3.1.9, the new election protocol no longer has this problem. Priorities are handled in a new way. |