Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Done
Priority: Major - P3
Fix Version/s: 9.0.0-rc0
Affects Version/s: None
Component/s: None
Labels:
- perf-optimization-finder

Assigned Teams:

Product Performance
Linked BF Score:
0
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Problem

query_stats::registerRequestImpl is called for every find command. At the default window-based configuration (256 req/sec), 0.53% cum in registerRequestImpl traces into WindowBasedPolicy::handle(), which unconditionally acquires a stdx::unique_lock on a global _windowMutex and calls SystemClockSource::now() (a clock_gettime syscall) for every request. At high throughput, approximately 99.94% of requests find _currentCount >= _requestLimit and return false — yet every thread acquires the mutex and calls the clock before discovering this, producing measurable pthread_mutex_lock contention across many concurrent threads.

Solution

Make _currentCount an AtomicWord<uint32_t> and add a relaxed-load pre-check at the top of WindowBasedPolicy::handle() before the mutex acquire. If _currentCount.loadRelaxed() >= _requestLimit.loadRelaxed(), return false immediately — skipping the lock and the clock entirely. The mutex-protected slow path remains authoritative for the small fraction of requests that actually get admitted. False negatives (admitting a stale-low pre-check into the slow path) are handled correctly under the lock; false positives (rejecting a stale-high pre-check) are harmless for observability-only sampling data.

Assignee:: Jawwad Asghar
Reporter:: Daniel Hill
Participants:: Daniel Hill, Githook User, Jawwad Asghar
Votes:: 0 Vote for this issue
Watchers:: 2 Start watching this issue

Created:: Mar 19 2026 11:33:57 PM UTC
Updated:: Apr 28 2026 07:35:32 PM UTC
Resolved:: Mar 31 2026 02:54:28 PM UTC

Details

Description

Problem

Solution

Attachments

Activity

People

Dates