Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-43038

Commit point can be stale on slaveDelay nodes and cause memory pressure

    • Type: Icon: Bug Bug
    • Resolution: Won't Fix
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 4.0.12
    • Component/s: WiredTiger
    • Environment:
    • Replication
    • ALL

      After restarting a stale slaveDelay node, the learned higher OpTime in the same term can be forgotten. If the commit point is in a higher term on all other nodes, the slaveDelay node can only advance its commit point on getMore responses due to SERVER-39831. On slaveDelay nodes, it's likely the buffer is already full, so the applier has to apply 16MB worth of oplog entries to make the room for bgsync to insert the last fetched batch and call another getMore. Applying 16MB oplog entires may be enough to trigger memory pressure, causing evictions.

      The issue will resolve when the slaveDelay node starts to apply oplog entries from the latest term. Memory pressure and evictions on slaveDelay nodes are undesired but not harmful.

      The same issue can happen without restart. Let's say an election happens in term 8 at time T0, but the node delays by 5 days and is still applying entries from term 7. At T0 + 2 days, another election occurs in term 9. Now the commit point is in term 9. At T0 + 5 days, when the delayed node starts to apply entries in term 8, it cannot advance its commit point beyond its last applied. Eventually, when the node starts to apply entries from term 9, everything's fine again.

      Original title and description:
      WT eviction threads consume a lot of CPU even when there is no apparent cache pressure

      After upgrading from 3.6 to 4.0.12 we encountered an overly high CPU consumption on our slave-delayed hidden replica set member. Restarting the member doesn't help, the CPU consumption goes down, but then goes up after some time.
      We recorded some logs, perf traces and statistics snapshots, see attached files. Also included are FTDC files for the relevant interval and some graphs from our monitoring system.

      "Before" means before the CPU spike, "after" – after it (occured about 15:47:31 +/- 5s).

      When CPU consumption is high, according to `perf report` about 96% of time is spent in `__wt_evict` (see `mongod-after.perf.txt` and `mongod-after.perf.data`). This coincides with `cache overflow score` metric jumping up from 0 to 100 (see `caches-before.log` and `caches-after.log`), despite the `bytes currently in the cache` (5703522791) being much smaller than `maximum bytes configured` (8589934592).

      This is a hidden delayed secondary, so there should be next to no load except replicating writes which are pretty low-volume. Before upgrading to 4.0 we did not have any issues regarding this service.

        1. caches.log
          1.41 MB
        2. metrics.2019-08-16T12-02-49Z-00000
          644 kB
        3. metrics-csrs-primary.zip
          83.09 MB
        4. metrics-csrs-primary-2.zip
          59.64 MB
        5. metrics-shard-primary.zip
          90.98 MB
        6. mongod.log
          296 kB
        7. mongod-after.perf.data
          27.98 MB
        8. mongod-after.perf.txt
          30 kB
        9. mongod-after.perf.txt
          30 kB
        10. mongodb-driveFS-files-1.conf
          0.6 kB
        11. mongod-before.perf.data
          1.74 MB
        12. mongod-new-logs.zip
          4.95 MB
        13. mongod-new-logs-2.zip
          555 kB
        14. mongo-metrics.zip
          75.15 MB
        15. overview.png
          305 kB
        16. scorpius-cache.png
          24 kB
        17. scorpius-cpu.png
          24 kB
        18. scorpius-cpu2.png
          35 kB
        19. scorpius-cpu-new.png
          52 kB
        20. Screen Shot 2019-09-24 at 1.23.47 PM.png
          Screen Shot 2019-09-24 at 1.23.47 PM.png
          188 kB
        21. Screen Shot 2019-10-09 at 12.47.34 PM.png
          Screen Shot 2019-10-09 at 12.47.34 PM.png
          282 kB
        22. zoom.png
          258 kB

            backlog-server-repl [DO NOT USE] Backlog - Replication Team
            onyxmaster Aristarkh Zagorodnikov
            1 Vote for this issue
            15 Start watching this issue