Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 3.6.7, 4.0.1, 4.1.1
Affects Version/s: None
Component/s: Testing Infrastructure
Labels:
- tig-resmoke

Backwards Compatibility:
Fully Compatible
Backport Requested:

v4.0, v3.6
Sprint:
TIG 2018-07-02, TIG 2018-07-16
Linked BF Score:
27
Story Points:
2
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

The electionTimeoutMillis parameter for the ContinuousStepdown hook, used in the concurrency stepdown suites, is set to 5000. We should increase this per the captured discussion:

> > On 2018/05/30 22:09:12, maxh wrote:
> > > [note] As mentioned in SERVER-34666, I don't think we should shorten the
> > > election timeout as it can lead to an election happening that isn't
> initiated
> > by
> > > the StepdownThread due to heartbeats being delayed. I'm okay with keeping
it
> > > as-is for now because it is consistent with the replica set configuration
> the
> > > JavaScript version would have used; however, I'd like for there to be a
> > > follow-up SERVER ticket to change it.
> > >
> > >
> >
>
https://jira.mongodb.org/browse/SERVER-34666?focusedCommentId=1873407&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-1873407
> >
> > For the followup ticket, do we just want to remove this value and use the
> > default, or set it to a higher timeout?
>
> I'm not sure - I'd like to get some input from Judah on it. I'm currently
> wondering if we really need to avoid setting the election timeout to 24 hours
> when all_nodes_electable=true. We're going to use the replSetStepUp command in
> the Python version of the StepdownThread to cause one of the secondaries to
run
> for election anyway. If for some reason the replSetStepUp command fails, then
> the former primary will try and step back up after 10 seconds on its own
anyway.
>
>
https://github.com/mongodb/mongo/blob/r4.1.0/buildscripts/resmokelib/testing/fixtures/replicaset.py#L149-L154

If you only want elections to come from the StepdownThread, then I'd recommend
setting the election timeout to 24 hours. The replSetStepUp command should still
work, and if it fails for some reason, then no other node will try to run for
election. There's no real difference between the default 10 seconds and the
current 5 seconds except for the amount of flakiness you'd expect (not the
existence of flakiness that we're trying to remove completely).

causes

SERVER-36817 replSetFreeze command run by stepdown thread may fail when server is already primary

Closed

is related to

SERVER-30642 Raise election timeouts as a way to provide more stable replica set test topologies

Closed

related to

SERVER-36448 Disable election handoff in suites that use the ContinuousStepdown hook

Closed

SERVER-36451 ContinuousStepdown with killing nodes can hang due to not being able to start the primary

Closed

Assignee:: Max Hirschhorn
Reporter:: Jonathan Abrahams (Inactive)
Participants:: Githook User, Jonathan Abrahams, Max Hirschhorn
Votes:: 0 Vote for this issue
Watchers:: 4 Start watching this issue

Created:: Jun 04 2018 06:41:17 PM UTC
Updated:: Oct 29 2023 10:31:07 PM UTC
Resolved:: Jul 02 2018 03:37:20 PM UTC
Confidence Status Last Update:: 20/Jun/18 10:21 PM

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates