Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-43038

Commit point can be stale on slaveDelay nodes and cause memory pressure

    • Type: Icon: Bug Bug
    • Resolution: Won't Fix
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 4.0.12
    • Component/s: WiredTiger
    • Environment:
    • Replication
    • ALL

      After restarting a stale slaveDelay node, the learned higher OpTime in the same term can be forgotten. If the commit point is in a higher term on all other nodes, the slaveDelay node can only advance its commit point on getMore responses due to SERVER-39831. On slaveDelay nodes, it's likely the buffer is already full, so the applier has to apply 16MB worth of oplog entries to make the room for bgsync to insert the last fetched batch and call another getMore. Applying 16MB oplog entires may be enough to trigger memory pressure, causing evictions.

      The issue will resolve when the slaveDelay node starts to apply oplog entries from the latest term. Memory pressure and evictions on slaveDelay nodes are undesired but not harmful.

      The same issue can happen without restart. Let's say an election happens in term 8 at time T0, but the node delays by 5 days and is still applying entries from term 7. At T0 + 2 days, another election occurs in term 9. Now the commit point is in term 9. At T0 + 5 days, when the delayed node starts to apply entries in term 8, it cannot advance its commit point beyond its last applied. Eventually, when the node starts to apply entries from term 9, everything's fine again.

      =======================================
      Original title and description:
      WT eviction threads consume a lot of CPU even when there is no apparent cache pressure

      After upgrading from 3.6 to 4.0.12 we encountered an overly high CPU consumption on our slave-delayed hidden replica set member. Restarting the member doesn't help, the CPU consumption goes down, but then goes up after some time.
      We recorded some logs, perf traces and statistics snapshots, see attached files. Also included are FTDC files for the relevant interval and some graphs from our monitoring system.

      "Before" means before the CPU spike, "after" – after it (occured about 15:47:31 +/- 5s).

      When CPU consumption is high, according to `perf report` about 96% of time is spent in `__wt_evict` (see `mongod-after.perf.txt` and `mongod-after.perf.data`). This coincides with `cache overflow score` metric jumping up from 0 to 100 (see `caches-before.log` and `caches-after.log`), despite the `bytes currently in the cache` (5703522791) being much smaller than `maximum bytes configured` (8589934592).

      This is a hidden delayed secondary, so there should be next to no load except replicating writes which are pretty low-volume. Before upgrading to 4.0 we did not have any issues regarding this service.

        1. caches.log
          1.41 MB
          Aristarkh Zagorodnikov
        2. metrics.2019-08-16T12-02-49Z-00000
          644 kB
          Aristarkh Zagorodnikov
        3. mongod.log
          296 kB
          Aristarkh Zagorodnikov
        4. mongod-after.perf.data
          27.98 MB
          Aristarkh Zagorodnikov
        5. mongod-after.perf.txt
          30 kB
          Aristarkh Zagorodnikov
        6. mongod-after.perf.txt
          30 kB
          Aristarkh Zagorodnikov
        7. mongodb-driveFS-files-1.conf
          0.6 kB
          Aristarkh Zagorodnikov
        8. mongod-before.perf.data
          1.74 MB
          Aristarkh Zagorodnikov
        9. overview.png
          305 kB
          Bruce Lucas
        10. scorpius-cache.png
          24 kB
          Aristarkh Zagorodnikov
        11. scorpius-cpu.png
          24 kB
          Aristarkh Zagorodnikov
        12. scorpius-cpu2.png
          35 kB
          Aristarkh Zagorodnikov
        13. scorpius-cpu-new.png
          52 kB
          Aristarkh Zagorodnikov
        14. Screen Shot 2019-09-24 at 1.23.47 PM.png
          188 kB
          Danny Hatcher
        15. Screen Shot 2019-10-09 at 12.47.34 PM.png
          282 kB
          Danny Hatcher
        16. zoom.png
          258 kB
          Bruce Lucas

            Assignee:
            backlog-server-repl [DO NOT USE] Backlog - Replication Team
            Reporter:
            onyxmaster Aristarkh Zagorodnikov
            Votes:
            1 Vote for this issue
            Watchers:
            15 Start watching this issue

              Created:
              Updated:
              Resolved: