[SERVER-32237] Nodes that cannot become primary must neither update progress nor vote "aye" Created: 08/Dec/17  Updated: 06/Dec/22  Resolved: 05/Feb/20

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Judah Schvimer Assignee: Backlog - Replication Team
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-17934 Do not report upstream progress while... Closed
Related
related to DOCS-11115 Update documentation on adding a new ... Closed
is related to SERVER-32185 Freshly synced secondaries respond to... Closed
Assigned Teams:
Replication
Operating System: ALL
Participants:

 Description   

Consider a 3 node replica set with a primary, a secondary, and a voting-unelectable node (rollback, initial sync, or recovering). Consider the case where all nodes are replicating from the primary. The primary takes writes at times T1, T2, and T3 with w:majority. The secondary replicates the write at T1, and the voting-unelectable node replicates the writes at T1 and T2. The primary will see that T1 and T2 are both replicated to a majority and it will commit them and acknowledge them to the client.

Now, if the primary crashes, consider what occurs. The secondary is behind the voting-unelectable node, so the voting-unelectable node won't vote for it (and can't because then we'd lose the majority-committed write), but the other node is unelectable. We will thus not be able to elect a primary. If the unelectable node is also inconsistent, this is even worse because there is no way to make it electable.Thus we should not update our progress if we're unelectable.

The node should not vote "aye" either. While voting "aye" will not cause us to lose committed writes (assuming we do not update progress as above), it will cause the unelectable node to vote for nodes that cannot commit writes, since it cannot be part of a majority to help commit writes.



 Comments   
Comment by Judah Schvimer [ 05/Feb/20 ]

This ticket and SERVER-17934 have both been scoped down to being the same. Closing this one as a duplicate.

Comment by Judah Schvimer [ 03/Jan/18 ]

While voting "aye" will not cause us to lose committed writes (assuming we do not update progress as above), it will cause the unelectable node to vote for nodes that cannot commit writes, since it cannot be part of a majority to help commit writes.

If you're okay with voting for a primary that cannot commit majority writes, then I think it is fine to keep voting. Users may find this behavior surprising and it could lead to longer rollbacks. It could also lead to a primary being elected that cannot commit majority writes even if another node exists that could immediately commit majority writes if it were elected.

Comment by Spencer Brody (Inactive) [ 02/Jan/18 ]

judah.schvimer, thinking about this further, do we actually need to not vote "aye" or only to not report progress? If we stop reporting progress then we don't need to worry about incorrectly satisfying a w:majority write, but if we keep voting (initial sync could consider all other nodes ahead of us, rollback could vote with the last common point) then we don't risk reducing write availability unnecessarily.

Comment by Eric Milkie [ 09/Dec/17 ]

Also, I’m not sure users will expect that their commit level may stop moving after setting maintenance mode, if we make it stop reporting position.

Comment by Eric Milkie [ 09/Dec/17 ]

Arbiters are also “nodes that cannot become primary”. I don’t think you can prohibit them from voting “aye”.

Comment by Judah Schvimer [ 08/Dec/17 ]

On second thought, we'll also have to make sure that the reporter does not send our updated optime in its liveness updates.

Comment by Judah Schvimer [ 08/Dec/17 ]

This also would still allow priority 0 nodes to forward their progress, but that's fine since they can always reconfig the nodes to be electable if needed.

Maintenance Mode will not allow nodes to forward their progress, which is probably what we want anyways.

Comment by Judah Schvimer [ 08/Dec/17 ]

We can probably just add a check that we're in SECONDARY here. The only concern would be making sure that if we do the check and then immediately become SECONDARY, but never replicate another operation, that we still update our sync source. Based on my reading of the Reporter, it sends progress periodically even without an update (for liveness updates presumably): https://github.com/mongodb/mongo/blob/2680f414b5fd303b93e48ff5a49fdf04535f05ec/src/mongo/db/repl/reporter.cpp#L293-L302

Generated at Thu Feb 08 04:29:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.