Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- qi-observability
- quick-tech-debt

Assigned Teams:

Query Integration
Linked BF Score:
200
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

In the sys-perf flamegraphs on master from BF-41814, SimpleMemoryUsageTracker::add accounts for ~1.72% of total CPU samples in the SortStage::doGetNext bounded sort path. This is despite chunking already being enabled. The overhead comes from the path that runs on every invocation even when no chunk boundary is crossed:

_inUseTrackedMemoryBytes += diff

tassert underflow check

Peak memory comparison and update

Integer division for chunk boundary check (_inUseTrackedMemoryBytes / _chunkSize)

The bounded sorter calls _memoryTracker.add() on every document added and every document returned, making this a very hot path.Areas of interest to investigate:

Whether the tassert check can be removed or made cheaper in release builds.

Whether the integer division for chunk boundary detection can be replaced with a cheaper comparison

Whether call frequency can be reduced by batching updates or only tracking net deltas at a coarser granularity.
Whether the peak tracking comparison can be deferred or sampled.

Assignee:: Unassigned
Reporter:: Lee Maguire
Participants:: Lee Maguire
Votes:: 0 Vote for this issue
Watchers:: 2 Start watching this issue

Created:: Mar 03 2026 05:40:10 PM UTC
Updated:: Mar 05 2026 07:41:18 PM UTC

Details

Description

Attachments

Activity

People

Dates