-
Type:
Task
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Query Execution
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Summary
nextCanonicalDouble() is called in two per-operation code paths for probabilistic sampling decisions: slow-op logging (shouldLogSlowOpWithSampling) and interrupt tracking (CurOp::startTime). On ARM (Graviton), this triggers logl() via std::uniform_real_distribution, which operates on 128-bit quad-precision long double - entirely software-emulated with no hardware floating-point support.
Profiling shows __logl_finite consuming 5.2% of CPU on a Graviton instance running YCSB 100 read.
Proposed Fix
Replace the floating-point sampling with integer-based comparison using nextUInt32():
See patch for proposed change.
See slack convo here.
This generates a uniform random uint32_t (pure integer arithmetic, no logl()) and compares against the rate threshold pre-scaled to the uint32 range. The probability semantics are identical.
Impact
Flamegraph analysis: __logl_finite drops from 5.2% of samples to 0. CurOp::completeAndLogOperation cost is cut roughly in half.
Perf Analyzer multi-patch results:
- In-cache YCSB 100% read: 5.2% throughput improvement
- In-cache YCSB 95% read / 5% write: 4% throughput improvement