Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-45880

Flow Control lag detection mechanism can overstate lag if there are oplog holes

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Works as Designed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Replication
    • Labels:
      None
    • Operating System:
      ALL
    • Sprint:
      Execution Team 2020-07-27, Execution Team 2021-02-08, Execution Team 2021-02-22

      Description

      Flow Control uses the lastApplied wall clock time minus the lastCommitted wall clock time as a proxy for replication lag. This measure can overstate the lag if there are oplog holes, since lastApplied can include operations after oplog holes, which cannot be replicated by secondaries due to the oplog hole.

      One proposed fix to address this is to use the wall clock time associated with the all_durable timestamp or the oplog visibility point instead of the lastApplied wall clock time, since these points do not include operations after oplog holes.

      Any solution to this issue that involves changing the components of the lag detection mechanism should ensure that 1) a wall clock time is available for the proposed timestamp 2) the proposed timestamp is accessible in-memory and is kept up-to-date.

      SERVER-46114 represents another case for reconsidering whether lastApplied minus lastCommitted is the best measure for lag.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              dianna.hohensee Dianna Hohensee
              Reporter:
              maria.vankeulen Maria van Keulen
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              20 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: