Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-35383

Increase electionTimeoutMillis for the ContinuousStepdown hook used in stepdown suites

    XMLWordPrintable

    Details

    • Backwards Compatibility:
      Fully Compatible
    • Backport Requested:
      v4.0, v3.6
    • Sprint:
      TIG 2018-07-02, TIG 2018-07-16
    • Linked BF Score:
      27
    • Story Points:
      2

      Description

      The electionTimeoutMillis parameter for the ContinuousStepdown hook, used in the concurrency stepdown suites, is set to 5000. We should increase this per the captured discussion:

      > > On 2018/05/30 22:09:12, maxh wrote:
      > > > [note] As mentioned in SERVER-34666, I don't think we should shorten the
      > > > election timeout as it can lead to an election happening that isn't
      > initiated
      > > by
      > > > the StepdownThread due to heartbeats being delayed. I'm okay with keeping
      it
      > > > as-is for now because it is consistent with the replica set configuration
      > the
      > > > JavaScript version would have used; however, I'd like for there to be a
      > > > follow-up SERVER ticket to change it.
      > > >
      > > >
      > >
      >
      https://jira.mongodb.org/browse/SERVER-34666?focusedCommentId=1873407&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-1873407
      > >
      > > For the followup ticket, do we just want to remove this value and use the
      > > default, or set it to a higher timeout?
      >
      > I'm not sure - I'd like to get some input from Judah on it. I'm currently
      > wondering if we really need to avoid setting the election timeout to 24 hours
      > when all_nodes_electable=true. We're going to use the replSetStepUp command in
      > the Python version of the StepdownThread to cause one of the secondaries to
      run
      > for election anyway. If for some reason the replSetStepUp command fails, then
      > the former primary will try and step back up after 10 seconds on its own
      anyway.
      >
      >
      https://github.com/mongodb/mongo/blob/r4.1.0/buildscripts/resmokelib/testing/fixtures/replicaset.py#L149-L154

      If you only want elections to come from the StepdownThread, then I'd recommend
      setting the election timeout to 24 hours. The replSetStepUp command should still
      work, and if it fails for some reason, then no other node will try to run for
      election. There's no real difference between the default 10 seconds and the
      current 5 seconds except for the amount of flakiness you'd expect (not the
      existence of flakiness that we're trying to remove completely).

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              max.hirschhorn Max Hirschhorn
              Reporter:
              jonathan.abrahams Jonathan Abrahams
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: