-
Type:
Task
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Product Performance
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Problem
query_stats::registerRequestImpl is called for every find command. At the default window-based configuration (256 req/sec), 0.53% cum in registerRequestImpl traces into WindowBasedPolicy::handle(), which unconditionally acquires a stdx::unique_lock on a global _windowMutex and calls SystemClockSource::now() (a clock_gettime syscall) for every request. At high throughput, approximately 99.94% of requests find _currentCount >= _requestLimit and return false — yet every thread acquires the mutex and calls the clock before discovering this, producing measurable pthread_mutex_lock contention across many concurrent threads.
Solution
Make _currentCount an AtomicWord<uint32_t> and add a relaxed-load pre-check at the top of WindowBasedPolicy::handle() before the mutex acquire. If _currentCount.loadRelaxed() >= _requestLimit.loadRelaxed(), return false immediately — skipping the lock and the clock entirely. The mutex-protected slow path remains authoritative for the small fraction of requests that actually get admitted. False negatives (admitting a stale-low pre-check into the slow path) are handled correctly under the lock; false positives (rejecting a stale-high pre-check) are harmless for observability-only sampling data.