-
Type:
Improvement
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Storage Execution
-
Storage Execution 2026-05-11
-
None
-
None
-
None
-
None
-
None
-
None
-
None
The aggregateSizeCountDeltasInOplog is called on every checkpoint scan. For each record returned by the oplog cursor it currently:
- Runs the full IDL parse via OplogEntry::parse, materializing every field of the entry even though only op, ns, ui, m, o, and ts are
actually consumed. - For applyOps entries, walks the inner-ops array twice: once via operationsOnFastCountCollections (which calls repl::ApplyOps::extractOperationsTo) to
detect writes to the internal fast-count store, and again via extractSizeCountDeltasForApplyOps to extract size/count deltas. Each pass copies common fields into a
fresh BSONObjBuilder per inner op and IDL-parses each result as an OplogEntry. - Constructs a NamespaceString eagerly for every entry, including for op-types that are immediately discarded (noops, container ops, most command types).
The cost is paid per oplog entry on the size/count read path and on every checkpoint, scaling with both oplog throughput and transaction size. Workloads with large prepared transactions or vectored inserts amplify the per-inner-op overhead.
A separate but related helper, aggregateMultiOpSizeMetadata, also lives in this file and shares logic with the scan path. One of its callers operates on in-memory repl::ReplOperation objects that have no BSON form, so any consolidation work needs to keep that case zero-allocation on the prepare hot path.
Note on planned tailable-cursor migration for checkpoint advance:
Checkpoint advance is planned to migrate from a periodic full-tail scan to a tailable cursor (SERVER-121018) that streams oplog entries as they are written. This optimization remains valuable under that model. The per-entry parsing cost is paid every time an entry is observed regardless of whether the work is batched at checkpoint time or amortized
continuously, and reducing it lowers steady-state CPU and allocation pressure for every consumer of this code.
Acceptance criteria:
- Public interface of aggregateSizeCountDeltasInOplog is preserved.
- No behavior change in the deltas it produces, the timestamps it advances, or the metrics it emits.
- Measurable reduction in per-entry CPU time on a representative oplog workload (CRUD-heavy and transaction-heavy mixes).
- is related to
-
SERVER-121018 Create tailable oplog cursor for replicated fast count
-
- Backlog
-