Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Won't Do
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 5.1.0, 5.0.4
Component/s: None
Labels:
None

Assigned Teams:

Replication
Operating System:
ALL
Linked BF Score:
15
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

The problem reproduced in the test is that while every node behaved properly the replica set found itself in the state when no primary will be elected. In particular:

The replicas are randomly restarted, randomly stepped up while the compatibility version is also randomly changed. The race happened when n0 initiated elections but:

n2 was just killed and thus did not participate in elections
n1 stepped down because it received election request from n0
n0 ignored the vote from n1 because n1's config was older than n0's

There are two ways to deal with it. The first would be to address the particular race ad-hoc, e.g. make the n1 to track that n0 election actually failed. However, this would be strange because it's n0 business to track its own election. It might be better to make n0 to run again, assuming the n1 config will eventually catch up.

I'm thinking that a more preferable solution would be to treat as a gap in our Raft implementation, making a node to watch if for the current term no node think it's primary for certain amount of time. It might be ok to receive '"primaryId":-1' sometimes while having no info on current primary by itself. But eventually, if there no node at all that declared it knows the primary for the current term, this should trigger new elections.

The complexity: what if the node is cut off from the others by network failure and cannot learn new primary? It might be sufficient to require that it received at least 1 more heartbeat from a node that is also not aware about a primary for this timeout. If we run 3-replica RS we are done. If we are running 5(or 5+)-replica elections it means there might be a disjoint cluster somewhere with consensus and this pair of disconnected votes won't get enough votes anyway.

The vanilla Raft that we respect states that a replica become Candidate when it does not receive a heartbeat from primary for a timeout. The case for this error is when the primary is unknown (-1), which it seems we don't handle properly.

Question: why our implementation allows unknown primary (-1) for a term, as reproduced by test? My understanding is that the term should not advance before majority is reached. Is that a peculiarity of our implementation that term is advanced at the election start and not election success? If that's by design I would presume it might be too much a refactoring to fix, but rather to proceed as I discussed.

Assignee:: [DO NOT USE] Backlog - Replication Team
Reporter:: Andrew Shuvalov (Inactive)
Participants:: [DO NOT USE] Backlog - Replication Team, Andrew Shuvalov
Votes:: 0 Vote for this issue
Watchers:: 5 Start watching this issue

Created:: Nov 16 2021 01:39:44 AM UTC
Updated:: Dec 06 2022 12:47:27 AM UTC
Resolved:: Nov 22 2021 08:06:16 PM UTC

Details

Description

Attachments

Activity

People

Dates