Tune PALite configs to sustain throughput long-running stress test runs

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Storage Engines, Storage Engines - Foundations
    • SE Foundations - 2025-11-21
    • 5

      We’ve observed that during long-running test/format runs, the FTDC stats show poor sustained throughput. While there are occasional spikes of high throughput, the app threads are mostly stalled for extended periods.

      Upon investigation, we found that:

      • Eviction threads were locking a page for several minutes.
      • This caused all app threads accessing that page to stall.
      • The long eviction times were due to sync calls from PALite.

      alexander.blekhman@mongodb.com and I tested a few configuration changes and found that with adjustments such as:

      • Increasing cache size
      • Avoiding syncing on every write

      we were able to achieve sustained throughput for longer durations and eliminate the extended stalls.

      Outcome:

      • Identify and validate the optimal Palite configs params to mitigate the issue in the short term.
      • In evergreen test/format (disagg.mode=leader) PALite runs, cache stuck or timeout issues should not occur.

            Assignee:
            Alex Blekhman
            Reporter:
            Sid Mahajan
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: