[DOCS-13929] Investigate changes in SERVER-43904: When stepping down, step up doesn't filter out frozen nodes Created: 13/Oct/20  Updated: 13/Nov/23  Resolved: 19/Jul/21

Status: Closed
Project: Documentation
Component/s: manual, Server
Affects Version/s: None
Fix Version/s: 4.9.0, 4.4.4, 4.0.23, 4.2.13, Server_Docs_20231030, Server_Docs_20231106, Server_Docs_20231105, Server_Docs_20231113

Type: Task Priority: Major - P3
Reporter: Backlog - Core Eng Program Management Team Assignee: Unassigned
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
backported by DOCS-14074 [BACKPORT] [v4.4] When stepping down,... Closed
backported by DOCS-14127 [BACKPORT] [v4.0] When stepping down,... Closed
backports DOCS-14125 [BACKPORT] [v4.2] When stepping down,... Closed
Documented
documents SERVER-43904 When stepping down, step up doesn't f... Closed
Participants:
Days since reply: 2 years, 29 weeks, 2 days ago
Epic Link: DOCSP-9747

 Description   

Description

Downstream Change Summary

I'm not sure if we track what heartbeats consist of, sorry if this doesn't actually need downstream team attention!

I added a heartbeat field, 'electable', to heartbeat responses. This tells the heartbeat response recipient if the node is electable to be primary or not. If a node has 'electable' set to false, when the primary looks for a secondary to step up during election handoff, it will skip choosing that node as the new primary (since it is not electable)

Description of Linked Ticket

One of the recommended ways [0] to force a particular node to become primary is to freeze all non-candidate nodes and then call replSetStepDown on the primary. As of MongoDB 3.6, that code attempts to step up a candidate (by calling replSetStepUp). However, that code doesn't exclude frozen nodes, and attempting to step up a frozen node will simply fail ("2019-10-09T00:24:05.517+0000 I REPL [conn352334] Not starting an election for a replSetStepUp request, since we are not electable due to: Not standing for election because I am still waiting for stepdown period to end at 2019-10-09T00:33:59.473+0000 (mask 0x20)"). This isn't particularly bad, since the unfrozen node will actually call for, and win, an election, but it does make failovers slower (up to electionTimeoutMillis slower, presumably).

An alternative approach that we're using, that isn't explicitly documented, is to increase the priority of both the current and candidate node, and then run replSetStepDown. I've verified both in code and logs that this is effective at getting mongo to step up the candidate node consistently. It might be nice to document this approach, since I think it offers improvements over both approaches currently mentioned. Increasing the priority on just the candidate works, but tends to be slower since the "priority takeover" mechanism takes a few seconds to trigger, and provides less control than an explicit replSetStepDown.

[0] https://docs.mongodb.com/manual/tutorial/force-member-to-be-primary/

Scope of changes

Impact to Other Docs

MVP (Work and Date)

Resources (Scope or Design Docs, Invision, etc.)



 Comments   
Comment by Jeffrey Allen [ 19/Jul/21 ]

Closing this ticket since we don't document the specific elements of heartbeats.

Generated at Thu Feb 08 08:09:04 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.