The aggregateSizeCountDeltasInOplog process does redundant parsing on every oplog record

XMLWordPrintableJSON

    • Type: Improvement
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Storage Execution
    • Storage Execution 2026-05-11
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      The aggregateSizeCountDeltasInOplog is called on every checkpoint scan. For each record returned by the oplog cursor it currently:

      • Runs the full IDL parse via OplogEntry::parse, materializing every field of the entry even though only op, ns, ui, m, o, and ts are
        actually consumed.
      • For applyOps entries, walks the inner-ops array twice: once via operationsOnFastCountCollections (which calls repl::ApplyOps::extractOperationsTo) to
        detect writes to the internal fast-count store, and again via extractSizeCountDeltasForApplyOps to extract size/count deltas. Each pass copies common fields into a
        fresh BSONObjBuilder per inner op and IDL-parses each result as an OplogEntry.
      • Constructs a NamespaceString eagerly for every entry, including for op-types that are immediately discarded (noops, container ops, most command types).

      The cost is paid per oplog entry on the size/count read path and on every checkpoint, scaling with both oplog throughput and transaction size. Workloads with large prepared transactions or vectored inserts amplify the per-inner-op overhead.

      A separate but related helper, aggregateMultiOpSizeMetadata, also lives in this file and shares logic with the scan path. One of its callers operates on in-memory repl::ReplOperation objects that have no BSON form, so any consolidation work needs to keep that case zero-allocation on the prepare hot path.

      Note on planned tailable-cursor migration for checkpoint advance:

      Checkpoint advance is planned to migrate from a periodic full-tail scan to a tailable cursor (SERVER-121018) that streams oplog entries as they are written. This optimization remains valuable under that model. The per-entry parsing cost is paid every time an entry is observed regardless of whether the work is batched at checkpoint time or amortized
      continuously, and reducing it lowers steady-state CPU and allocation pressure for every consumer of this code.

      Acceptance criteria:

      • Public interface of aggregateSizeCountDeltasInOplog is preserved.
      • No behavior change in the deltas it produces, the timestamps it advances, or the metrics it emits.
      • Measurable reduction in per-entry CPU time on a representative oplog workload (CRUD-heavy and transaction-heavy mixes).

            Assignee:
            Ernesto Rodriguez Reina
            Reporter:
            Ernesto Rodriguez Reina
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: