Skip zero-value atomic adds in CollectionIndexUsageTracker on index-only reads

XMLWordPrintableJSON

    • Product Performance
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Problem

      On YCSB 100% read (128 threads, ARM64 Graviton), CollectionIndexUsageTrackerDecoration::recordCollectionIndexUsage accounts for 0.26% flat / 0.31% cum of total CPU. The function is called on every top-level query through endQueryOp, and issues four atomic RMWs on shared counters (_sharedStats->_collectionScans, _sharedStats->_collectionScansNonTailable, and two global MetricBuilder<Counter64> counters in the decoration layer) — plus three more under an isSystemDotProfile branch.

      For index-only queries (which is 100% of YCSB reads on the _id index), the collectionScans and collectionScansNonTailable values passed into these calls are always zero. On ARM64, fetchAndAdd(0) is not a no-op at the hardware level: it emits an ldaxr/stlxr exclusive-store pair that writes the same value back and invalidates the cache line in every other core's L1d cache. With 128 threads executing this path per operation, the shared counter cache lines bounce proportionally to core count, spending cycles on coherence traffic for counter updates that change nothing.

      Solution

      Add local if (x > 0) guards in front of each zero-value atomic increment on the index-only read path. Three changes across two files:

      • CollectionIndexUsageTracker::recordCollectionScans() — guard the fetchAndAdd on _sharedStats->_collectionScans.
      • CollectionIndexUsageTracker::recordCollectionScansNonTailable() — guard the fetchAndAdd on _sharedStats->_collectionScansNonTailable.
      • CollectionIndexUsageTrackerDecoration::recordCollectionIndexUsage() — wrap the five Counter64::increment calls (two global, three inside the existing isSystemDotProfile branch) in a single compound guard if (collectionScans > 0 || collectionScansNonTailable > 0).

      The guard is a local comparison against a function parameter — free when the branch is predictable (which it is on YCSB-style index-only reads, where the predicate is always false). The non-zero path — legitimate collection scans — is unchanged.

      The change follows the same pattern as merged SERVER-120293 (skip-zero-atomic-adds-in-finalize-operation-stats, +0.96%), which guarded ~30 zero-value atomic adds in finalizeOperationStats in an adjacent hot path.

            Assignee:
            Jawwad Asghar
            Reporter:
            Jawwad Asghar
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: