Identify statistics to track block cache contention

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Done
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Block Cache
    • None
    • Storage Engines - Persistence
    • 371.154
    • SE Persistence backlog
    • None

      Motivation
      Before enabling the block cache in production we need visibility into lock contention it may introduce. The cache is a sharded ConcurrentSizedLRUCache (src/mongo/db/modules/atlas/src/disagg_storage/pali/block_cache/sized_lru_cache.h); every find/add/erase/getErase acquires the per-shard synchronized_value mutex, and the global Counter64 hit/miss counters in pali_block_cache.cpp are incremented on every get/put. Under high throughput these are the natural contention points, but today nothing measures them.

      Approach

      • Instrument per-shard mutex wait time / contended-acquire counts in ConcurrentSizedLRUCache (sized_lru_cache.h).
      • Surface the new counters in the existing blockCache ServerStatus section (pali_block_cache.cpp, BlockCacheServerStatus) so they flow into FTDC.
      • Keep the instrumentation cheap (e.g. relaxed atomics / try-lock fast path) so it does not itself become a bottleneck.

      Definition of Done

      • New contention stat(s) appear under db.serverStatus().blockCache and in FTDC.
      • Stats are validated by a unit test in pali_test.cpp (or sized_lru_cache_test.cpp) that drives concurrent access and observes the counters move.
      • Negligible overhead when the cache is uncontended.

            Assignee:
            Etienne Petrel
            Reporter:
            Etienne Petrel
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: