Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-66023

Do not constantly reset election and liveness timers

    • Type: Icon: Improvement Improvement
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 6.0.1, 5.0.11, 6.1.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • Fully Compatible
    • v6.0, v5.3, v5.0, v4.4
    • Repl 2022-05-02, Repl 2022-05-16, Repl 2022-05-30
    • 135

      We cancel the election timeout on secondaries whenever the primary liveness is updated, which potentially happens on every oplog batch. We cancel the liveness timeout on primaries whenever the oldest secondary liveness is updated, which potentially happens on every replSetUpdatePosition. It turns out cancelling a timer, at least on Linux, is quite expensive (likely system call overhead), and we do this in the replication lock, which increases contention on that already-hot mutex.

      We can greatly reduce this with a class which handles "cancel and reschedule" by keeping track of the latest time of the reschedule, and then when the timeout occurs, reschedules at that point instead of immediately. This means we get no cancels and one reschedule every timeout interval (not every miniscule bump forward of the timer)

            Assignee:
            matthew.russotto@mongodb.com Matthew Russotto
            Reporter:
            matthew.russotto@mongodb.com Matthew Russotto
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: