-
Type:
Task
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Internal Code
-
Product Performance
-
200
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Problem
Top::record is called on every operation completion (via AutoStatsTracker::~AutoStatsTracker) and protects all per-collection counter updates with a single global exclusive ObservableMutex<stdx::mutex> (_lockUsage). On YCSB 128-thread in-cache reads, FTDC measures 1.33% contention rate on Top_lockUsage — the highest of any mutex in the system — driving 1,054 contention events/s and ~14.3ms/s of aggregate thread-stall time, plus cache-line bouncing on every lock/unlock across all 128 worker threads. The data being protected is diagnostic counters: per-collection time/count, opLatencyHistogram buckets, and a sticky isStatsRecordingAllowed flag — all values that tolerate relaxed-atomic semantics.
Solution
Convert Top::CollectionData counters to atomic types (Atomic<long long> for UsageData::time/count, AtomicOperationLatencyHistogram for opLatencyHistogram, Atomic<bool> for isStatsRecordingAllowed) and switch _lockUsage from exclusive mutex to ObservableSharedMutex (std::shared_mutex). The fast path in record() now uses stdx::shared_lock, allowing all 128 threads to update existing-collection counters concurrently with no serialization; the exclusive lock is taken only for first-time collection insertion and on collectionDropped(). The patch reuses AtomicOperationLatencyHistogram, which already exists for ServiceLatencyTracker and is documented thread-safe with partial-visibility-via-append acceptable for diagnostics. UsageMap is changed to store std::unique_ptr<CollectionData> so heap allocations stay stable across rehash (since Atomic<T> is non-movable).
- is related to
-
SERVER-125558 Use strict comparison to choose final CE result in estimate(OIL) in cardinality estimator
-
- Closed
-
-
SERVER-125705 Fix heuristic CE source assertion in cbr_infrastructure.js
-
- Closed
-