-
Type:
Task
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
21.147
-
None
-
None
Problem
On every WiredTiger session release, WiredTigerKVEngine::sizeStorerPeriodicFlush acquires _sizeStorerSyncTrackerMutex just to ask the ElapsedTracker debouncer "has the periodic interval elapsed?" — and the answer is no on more than 99.7% of calls (defaults: 100k hits or 60s between flushes). On a YCSB 100% read workload running 172k session releases/sec, that single mutex acquire+release pair shows up as 0.26% of total CPU (_releaseSession → pthread_mutex_lock 4.32% and _releaseSession → __pthread_mutex_unlock_usercnt 6.99%, with sizeStorerPeriodicFlush → pthread_mutex_lock accounting for 74% of the function's own time). The mutex protects ElapsedTracker._pings / _last bookkeeping, not any correctness invariant — two threads racing past the precheck and both calling syncSizeInfo(false) would be benign because the actual flush is already thread-safe via WiredTiger's metadata cursor write.
Solution
Replace the per-call mutex with a lock-free atomic precheck that mirrors the ElapsedTracker thresholds. Two new AtomicWord<int64_t> members on WiredTigerKVEngine (_sizeStorerHitsSinceFlush, _sizeStorerLastFlushMillis) gate the fast path — a fetchAndAdd(1) on the hit counter plus a loadRelaxed of the last-flush timestamp followed by a comparison against gWiredTigerSizeStorerPeriodicSyncHits and gWiredTigerSizeStorerPeriodicSyncPeriodMillis. The slow path retains the original mutex around ElapsedTracker::intervalHasElapsed() so the tracker remains the source of truth for whether to flush, and the atomics are reset to zero / nowMs under the mutex when a flush actually fires. Same hit-count and time triggers, same reset behavior on flush — just no mutex on the >99.7% case.