[DOCS-10844] Stepdown command must take global lock in exclusive mode (SERVER-28544) Created: 27/Sep/17  Updated: 29/Oct/23  Resolved: 08/Feb/18

Status: Closed
Project: Documentation
Component/s: manual, Server
Affects Version/s: None
Fix Version/s: 3.5.14

Type: Task Priority: Major - P3
Reporter: Kay Kim (Inactive) Assignee: Jonathan DeStefano
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Documented
documents SERVER-28544 Stepdown command must take global loc... Closed
Participants:
Days since reply: 6 years, 5 days ago

 Description   

Documentation Request Summary:

stepdown command no longer blocks writes while waiting for secondaries to catch up, instead writes will start failing with NotMaster immediately, but if no secondaries catch up within the secondary catch up time period then the node will step back up automatically and start accepting writes again.

Engineering Ticket Description:

Currently the stepdown command runs with the global lock in shared mode, which violates the concurrency rules for the _canAcceptNonLocalWrites variable.



 Comments   
Comment by Githook User [ 09/Feb/18 ]

Author:

{'email': 'jonathan.destefano@10gen.com', 'name': 'jonathan', 'username': 'jdestefano-mongo'}

Message: DOCS-10844 - Writes now fail instead of getting blocked when running replSetStepDown or rs.stepDown().
Branch: master
https://github.com/mongodb/docs/commit/80f3effcf202dab94863ac7e2cbaf611f2bf5ade

Comment by Spencer Brody (Inactive) [ 22/Jan/18 ]

It comes down to how you define when a node is "stepped down". Is it "stepped down" whenever it cannot accept writes, or only when it stops reporting itself as being in PRIMARY state? Either way, I think your assertion that the crux of the behavior change is that writes fail instead of blocking while waiting for secondaries to catch up is correct. If the writes being done are retryable, then the driver will automatically attempt to retry them against the new primary.

Comment by Jonathan DeStefano [ 22/Jan/18 ]

After reviewing existing content and discussion with spencer I am led to believe a piece of the Documentation Request Summary is incorrect:

"... if no secondaries catch up within the secondary catch up time period then the node will step back up automatically and start accepting writes again."

I believe at this point of the procedure the primary has not yet stepped down. As per current documentation (which I believe to be correct):
"If no electable secondary meets this criterion by the waiting period, the primary does not step down and the command errors."

After which, it will be able to accept writes again.

After a cursory search, I believe the only change required for this ticket is to update:
From: (command/method) blocks all writes to the primary while it runs.
To: All writes to the primary will fail while the (command/method) runs.

Generated at Thu Feb 08 08:01:30 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.