-
Type:
Improvement
-
Resolution: Fixed
-
Priority:
Major - P3
-
Affects Version/s: None
-
Component/s: Btree
-
Security Level: Public (Available to anyone on the web)
-
Storage Engines
-
1,545.042
-
None
-
None
Problem
Off-CPU profiling of a read-heavy workload shows 33.4% of total off-CPU time stuck in futex contention on the tcmalloc spinlock. The dominant stack traces back through __wt_blkcache_read -> __wt_buf_grow_worker -> __libc_realloc.
Here's what's happening: __wt_blkcache_read allocates a 4KB scratch buffer to decompress pages, but the typical leaf page is 32KB (maxleafpage). So every single compressed page read forces the buffer to grow via realloc, which then has to fight for tcmalloc's HugePageAwareAllocator::LockAndAlloc spinlock across all concurrent reader threads. Each of these contentions costs 1-5ms of off-CPU time.
Solution
Pre-size the scratch buffer in __wt_blkcache_read to S2BT(session)->maxleafpage instead of the hardcoded 4KB. For the common case (page size <= maxleafpage), this eliminates the realloc entirely.
The scratch buffer pool already handles buffer lifecycle and reuse – this change only adjusts the initial allocation size. If a page is larger than maxleafpage (e.g. an internal page), the existing grow-on-demand path kicks in exactly as it does today.
This is a single-line edit in block_io.c.
- related to
-
WT-17232 test_prepare33 checkpoint_and_verify_stats assertion error
-
- Blocked
-