Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-2647

Improve page-out behavior for OpLog

    • Type: Icon: Improvement Improvement
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None

      The current heuristic for append-only tables (such as the mongodb oplog) causes recently written pages to be evicted from memory in order to avoid polluting the cache. This means that when the page is read it must be read from disk (or OS cache) then decompressed before results can be returned. On a system with high write load, this increases the work done in the critical path to getting oplog data to downstream nodes in a replica set.

      One solution we discussed in person would be for the uncompressed page images for the oplog to be held in cache after they've been written to disk. MongoDB could then periodically inform WT of the lowest position that is still "interesting" so that older pages can still be rapidly evicted. "Interesting" can be loosely defined as the position of the furthest behind replica that is still up, possibly also considering the position of any oplog-tailing cursors created by users.

            backlog-server-execution [DO NOT USE] Backlog - Storage Execution Team
            mathias@mongodb.com Mathias Stearn
            0 Vote for this issue
            5 Start watching this issue