Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-34942

Stuck with cache full during oplog replay in initial sync

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.6.7, 4.0.1, 4.1.2
    • Component/s: Replication
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v4.0, v3.6
    • Sprint:
      Storage NYC 2018-07-16, Storage NYC 2018-07-30
    • Case:
    • Linked BF Score:
      39

      Description

      The oldest timestamp is only advanced at the end of every batch during oplog replay in initial sync. This means that all dirty content generated by the application of the operations in a single batch will be pinned in cache. If the batch is large enough and the operations are heavy enough this dirty content can exceed eviction_dirty_trigger (default 20% of cache) and the rate of applying operations will become dramatically slower because it has to wait for the dirty data to be reduced below the threshold.

      In extreme cases the node can become completely stuck due to full cache preventing a batch from completing and unpinning the data that is keeping the cache full (although I'm not sure if that's a necessary consequence of this or a failure of the lookaside mechanism to keep the node from getting completely stuck.)

      This is similar to SERVER-34938, but I believe oplog application during initial sync is a different codepath from normal replication. If not feel free to close as a dup.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              benety.goh Benety Goh
              Reporter:
              bruce.lucas Bruce Lucas
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              17 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: