Pre-size block read scratch buffer to maxleafpage to eliminate realloc contention

XMLWordPrintableJSON

    • Storage Engines
    • 1,545.042
    • None
    • None

      Problem

      Off-CPU profiling of a read-heavy workload shows 33.4% of total off-CPU time stuck in futex contention on the tcmalloc spinlock. The dominant stack traces back through __wt_blkcache_read -> __wt_buf_grow_worker -> __libc_realloc.

      Here's what's happening: __wt_blkcache_read allocates a 4KB scratch buffer to decompress pages, but the typical leaf page is 32KB (maxleafpage). So every single compressed page read forces the buffer to grow via realloc, which then has to fight for tcmalloc's HugePageAwareAllocator::LockAndAlloc spinlock across all concurrent reader threads. Each of these contentions costs 1-5ms of off-CPU time.

      Solution

      Pre-size the scratch buffer in __wt_blkcache_read to S2BT(session)->maxleafpage instead of the hardcoded 4KB. For the common case (page size <= maxleafpage), this eliminates the realloc entirely.

      The scratch buffer pool already handles buffer lifecycle and reuse – this change only adjusts the initial allocation size. If a page is larger than maxleafpage (e.g. an internal page), the existing grow-on-demand path kicks in exactly as it does today.

      This is a single-line edit in block_io.c.

            Assignee:
            Jawwad Asghar
            Reporter:
            Jawwad Asghar
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: