Investigate performance overhead of SimpleMemoryUsageTracker::add in bounded sort path

XMLWordPrintableJSON

    • Type: Improvement
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Query Integration
    • 200
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      In the sys-perf flamegraphs on master from BF-41814, SimpleMemoryUsageTracker::add accounts for ~1.72% of total CPU samples in the SortStage::doGetNext bounded sort path. This is despite chunking already being enabled. The overhead comes from the path that runs on every invocation even when no chunk boundary is crossed:

      • _inUseTrackedMemoryBytes += diff
      • tassert underflow check
      • Peak memory comparison and update
      • Integer division for chunk boundary check (_inUseTrackedMemoryBytes / _chunkSize)

      The bounded sorter calls _memoryTracker.add() on every document added and every document returned, making this a very hot path.Areas of interest to investigate:

      • Whether the tassert check can be removed or made cheaper in release builds.
      • Whether the integer division for chunk boundary detection can be replaced with a cheaper comparison
      • Whether call frequency can be reduced by batching updates or only tracking net deltas at a coarser granularity.
      • Whether the peak tracking comparison can be deferred or sampled.

            Assignee:
            Unassigned
            Reporter:
            Lee Maguire
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: