Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-29937

Make sure liveness timeouts cannot be missed

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.2.18, 3.4.11, 3.6.0-rc0
    • Component/s: Replication
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v3.4, v3.2
    • Sprint:
      Repl 2017-10-02
    • Linked BF Score:
      0

      Description

      In ReplicationCoordinatorImpl::_scheduleNextLivenessUpdate_inlock(), we do not schedule a new liveness update if the nextTimeout would be in the past. This is wrong; we should schedule an immediate liveness update in that case.

      One scenario is that we have just run our liveness check and the earliest live member was just barely fresh ("almost stale"), so we do nothing. A small time passes before we schedule the new one, and now that member is stale, so the next timeout period is in the past. We then stop doing liveness checks.

        Attachments

          Activity

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: