Force eviction of ingest tree pages can stall write (oplog) threads

XMLWordPrintableJSON

    • Storage Engines, Storage Engines - Persistence
    • 0.263
    • None
    • None

      Analysis performed on disagg workloads where a standby replicates the primary workload indicates that oplog application threads are blocking attempting to evict pages from the ingest tree. The ingest tree pages are likely exceeding memory_page_max hence qualify for forced eviction. However the pages are commonly hot pages and have active hazard pointers on them, or similarly are ahead of the prune_timestamp and thus eviction of the page is likely to either fail or not be effective. 

      This creates a very bad pattern of repeated forced eviction attempts getting exclusive access to the page, blocking the oplog application threads which then gradually lag further and further behind the primary. A follow on affect from this is that the prune_timestamp moves less as less checkpoints are picked up.

      Two solutions to this issue have been attempted:

      1. Add an additional prune timestamp check prior taking exclusive access on the page.
      2. Prevent application threads from force evciting ingest tree pages

      The first solution showed little to no impact, the second solution almost entirely resolves lag in the workloads tested. As resolving this issue is high priority we will implement the forced eviction fix for the time being.

      There are some potential downsides to the fix, which take the form of never evicting a page, thus it grows extremely large in memory and eventually causes different issues. However tests show that this isn't the case, at least not for the relevant workloads. Regular eviction does succeed in evicting the page, splitting and resolving the issue.

      I will add FTDC in the comments of this ticket.

        1. image-2026-03-27-16-33-40-498.png
          image-2026-03-27-16-33-40-498.png
          169 kB
        2. image-2026-03-27-16-35-12-913.png
          image-2026-03-27-16-35-12-913.png
          236 kB
        3. image-2026-03-30-10-15-40-787.png
          image-2026-03-30-10-15-40-787.png
          553 kB
        4. image-2026-03-30-17-43-07-316.png
          image-2026-03-30-17-43-07-316.png
          210 kB
        5. image-2026-03-30-17-44-20-810.png
          image-2026-03-30-17-44-20-810.png
          131 kB

            Assignee:
            Luke Pearson
            Reporter:
            Luke Pearson
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: