-
Type:
Improvement
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Query Integration
-
200
-
None
-
None
-
None
-
None
-
None
-
None
-
None
In the sys-perf flamegraphs on master from BF-41814, SimpleMemoryUsageTracker::add accounts for ~1.72% of total CPU samples in the SortStage::doGetNext bounded sort path. This is despite chunking already being enabled. The overhead comes from the path that runs on every invocation even when no chunk boundary is crossed:
- _inUseTrackedMemoryBytes += diff
- tassert underflow check
- Peak memory comparison and update
- Integer division for chunk boundary check (_inUseTrackedMemoryBytes / _chunkSize)
The bounded sorter calls _memoryTracker.add() on every document added and every document returned, making this a very hot path.Areas of interest to investigate:
- Whether the tassert check can be removed or made cheaper in release builds.
- Whether the integer division for chunk boundary detection can be replaced with a cheaper comparison
- Whether call frequency can be reduced by batching updates or only tracking net deltas at a coarser granularity.
- Whether the peak tracking comparison can be deferred or sampled.