-
Type: Bug
-
Resolution: Cannot Reproduce
-
Priority: Major - P3
-
None
-
Affects Version/s: 2.6.3
-
Component/s: Replication
-
ALL
In certain scenarios, the replica-set with priorities can end up in a state where a PRIMARY cannot be elected anymore until some election causing event occurs.
i.e. for data nodes A, B C and arbiter D, E (B & D being on same machine). Following sequence of events caused this to occur:
Priority = 1 for A, C
Priority = 0.5 for B
Timeline | A-State | B-State | C-State | Comments |
---|---|---|---|---|
T + 0 | Primary | Secondary | Secondary | |
T + 1 | Not Reachable | Primary | Not Reachable | 'A' and 'C' not reachable from 'E'-Arb, 'B' selected Primary |
T + 2 | Recovering | Secondary | Secondary | |
T + 3 | Primary | Secondary | Secondary | 'B'p stepped down because of lower priority, 'E'-Arb not able to see any primary |
T + 4 | Not Reachable | Secondary | Not Reachable | 'A', 'C' not reachable from 'B' and 'B'-Arb |
T + 5 | Not Reachable | Primary | Not Reachable | 'B' elected Primary, since it was not reachable from 'B', 'D'-Arb, 'E'-Arb |
T + 6 | Secondary | Primary | Secondary | 'A' relinqueshed Primary since 'B' was more recently elected Primary, 'A' syncing to 'C' |
T + 7 | Rollback | Primary | Secondary | |
T + 8 | Recovering | Secondary | Secondary | 'A' while still Recovering steps downs 'B' that has lower priority and is only 4 seconds ahead of 'C' |
T + 9 | Secondary | Secondary | Secondary | 'A', 'C' not electing because they are not freshest which implies very likely 'B' has the latest optime (since no rollback was seen when it was stepped down by 'A'). 'B' does not elect itself saying 'E'-Arb will veto for lower priority. 'B' is ahead by a few seconds of 'C' |
T + XX | Down | Secondary | Primary | Shutting down 'A' causes sync target change for 'C' / rollback followed by a fresh election in the replica-set |