Add observability to PALI block cache

XMLWordPrintableJSON

    • Type: Epic
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Block Cache
    • Storage Engines - Persistence
    • 371.201
    • SE Persistence backlog
    • None
    • PALI block cache observability

      The new block cache (the "pali" block cache in the mongo/atlas disaggregated-storage module, src/mongo/db/modules/atlas/src/disagg_storage/pali/block_cache/) is a sharded ConcurrentSizedLRUCache keyed by (table_id, page_id, lsn). It works today but ships behind disabled-by-default startup knobs (disaggBlockCacheSizeBytes and disaggBlockCacheNumShards, both default 0) and exposes only basic stats under the blockCache ServerStatus section.

      This epic tracks the remaining pieces to bring the block cache to production:

      Observability

      • Stats to track any contention related to the block cache
      • Stats to track perf improvements gained from the block cache
      • A stat to report whether the block cache is enabled/disabled

      Admission policy

      • A config item to allocate the percentage of clean pages that go into the block cache
      • Ensure pages belonging to a cold collection never go into the block cache

      Enablement

      • Enable the block cache by default with a fixed size or a percentage of total available memory

      Fleet monitoring

      • Grafana panel for the number of clusters with the block cache enabled
      • Grafana panel for the memory available on clusters where the block cache is enabled

            Assignee:
            Etienne Petrel
            Reporter:
            Etienne Petrel
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated: