Commit point can be stale on slaveDelay nodes and cause memory pressure

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Won't Fix
    • Priority: Major - P3
    • None
    • Affects Version/s: 4.0.12
    • Component/s: WiredTiger
    • Environment:
    • Replication
    • ALL
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      After restarting a stale slaveDelay node, the learned higher OpTime in the same term can be forgotten. If the commit point is in a higher term on all other nodes, the slaveDelay node can only advance its commit point on getMore responses due to SERVER-39831. On slaveDelay nodes, it's likely the buffer is already full, so the applier has to apply 16MB worth of oplog entries to make the room for bgsync to insert the last fetched batch and call another getMore. Applying 16MB oplog entires may be enough to trigger memory pressure, causing evictions.

      The issue will resolve when the slaveDelay node starts to apply oplog entries from the latest term. Memory pressure and evictions on slaveDelay nodes are undesired but not harmful.

      The same issue can happen without restart. Let's say an election happens in term 8 at time T0, but the node delays by 5 days and is still applying entries from term 7. At T0 + 2 days, another election occurs in term 9. Now the commit point is in term 9. At T0 + 5 days, when the delayed node starts to apply entries in term 8, it cannot advance its commit point beyond its last applied. Eventually, when the node starts to apply entries from term 9, everything's fine again.

      =======================================
      Original title and description:
      WT eviction threads consume a lot of CPU even when there is no apparent cache pressure

      After upgrading from 3.6 to 4.0.12 we encountered an overly high CPU consumption on our slave-delayed hidden replica set member. Restarting the member doesn't help, the CPU consumption goes down, but then goes up after some time.
      We recorded some logs, perf traces and statistics snapshots, see attached files. Also included are FTDC files for the relevant interval and some graphs from our monitoring system.

      "Before" means before the CPU spike, "after" – after it (occured about 15:47:31 +/- 5s).

      When CPU consumption is high, according to `perf report` about 96% of time is spent in `__wt_evict` (see `mongod-after.perf.txt` and `mongod-after.perf.data`). This coincides with `cache overflow score` metric jumping up from 0 to 100 (see `caches-before.log` and `caches-after.log`), despite the `bytes currently in the cache` (5703522791) being much smaller than `maximum bytes configured` (8589934592).

      This is a hidden delayed secondary, so there should be next to no load except replicating writes which are pretty low-volume. Before upgrading to 4.0 we did not have any issues regarding this service.

        1. scorpius-cache.png
          24 kB
          Aristarkh Zagorodnikov
        2. scorpius-cpu.png
          24 kB
          Aristarkh Zagorodnikov
        3. scorpius-cpu2.png
          35 kB
          Aristarkh Zagorodnikov
        4. caches.log
          1.41 MB
          Aristarkh Zagorodnikov
        5. metrics.2019-08-16T12-02-49Z-00000
          644 kB
          Aristarkh Zagorodnikov
        6. mongod.log
          296 kB
          Aristarkh Zagorodnikov
        7. mongod-before.perf.data
          1.74 MB
          Aristarkh Zagorodnikov
        8. mongod-after.perf.data
          27.98 MB
          Aristarkh Zagorodnikov
        9. mongodb-driveFS-files-1.conf
          0.6 kB
          Aristarkh Zagorodnikov
        10. mongod-after.perf.txt
          30 kB
          Aristarkh Zagorodnikov
        11. mongod-after.perf.txt
          30 kB
          Aristarkh Zagorodnikov
        12. scorpius-cpu-new.png
          52 kB
          Aristarkh Zagorodnikov
        13. Screen Shot 2019-09-24 at 1.23.47 PM.png
          188 kB
          Danny Hatcher
        14. overview.png
          305 kB
          Bruce Lucas
        15. zoom.png
          258 kB
          Bruce Lucas
        16. Screen Shot 2019-10-09 at 12.47.34 PM.png
          282 kB
          Danny Hatcher

            Assignee:
            [DO NOT USE] Backlog - Replication Team
            Reporter:
            Aristarkh Zagorodnikov
            Votes:
            1 Vote for this issue
            Watchers:
            15 Start watching this issue

              Created:
              Updated:
              Resolved: