[SERVER-35442] stepdown global lock acqusition should use wait time, not freeze time Created: 06/Jun/18  Updated: 29/Oct/23  Resolved: 23/Jul/18

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.0.2, 4.1.2

Type: Bug Priority: Major - P3
Reporter: Eric Milkie Assignee: Vesselina Ratcheva (Inactive)
Resolution: Fixed Votes: 0
Labels: neweng
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.0
Sprint: Repl 2018-07-16, Repl 2018-07-30
Participants:

 Description   

When we added interruptibility to lock acquisitions, we chose the "stepDownUntil" deadline for the global lock acquisition timeout in ReplicationCoordinatorImpl::stepDown(). This unfortunately-named variable is actually the freeze time, which dictates how long a node will wait before attempting to become primary again, after the stepdown has finished and the function has returned.
Instead, we should be using the "waitUntil" deadline, which is the time the user is willing to wait for the stepdown to complete before it gives up and returns an error.

This function is used by both the replicaSetStepDown and shutdown commands, and so this bug affects both.



 Comments   
Comment by Githook User [ 04/Aug/18 ]

Author:

{'username': 'vessy-mongodb', 'name': 'Vesselina Ratcheva', 'email': 'vesselina.ratcheva@10gen.com'}

Message: SERVER-35442 Use wait time in stepdown global lock acquisition

(cherry picked from commit ac93e387d998d28e493857a1eebb8a044738bbc0)
Branch: v4.0
https://github.com/mongodb/mongo/commit/82de714df63a768a6d682536fce1957e70ac1302

Comment by Githook User [ 23/Jul/18 ]

Author:

{'username': 'vessy-mongodb', 'name': 'Vesselina Ratcheva', 'email': 'vesselina.ratcheva@10gen.com'}

Message: SERVER-35442 Use wait time in stepdown global lock acquisition
Branch: master
https://github.com/mongodb/mongo/commit/ac93e387d998d28e493857a1eebb8a044738bbc0

Comment by Eric Milkie [ 28/Jun/18 ]

This problem exists in older branches than 4.0, but I don't see it as super critical to backport a fix there.

Generated at Thu Feb 08 04:39:49 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.