Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 9.0.0-rc0
Affects Version/s: None
Component/s: None
Labels:
None

Assigned Teams:

Storage Execution
Backwards Compatibility:
Fully Compatible
Sprint:
Storage Execution 2026-05-11, Storage Execution 2026-05-25, Storage Execution 2026-06-08, Storage Execution 2026-06-22
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

The aggregateSizeCountDeltasInOplog is called on every checkpoint scan. For each record returned by the oplog cursor it currently:

Runs the full IDL parse via OplogEntry::parse, materializing every field of the entry even though only op, ns, ui, m, o, and ts are
actually consumed.
For applyOps entries, walks the inner-ops array twice: once via operationsOnFastCountCollections (which calls repl::ApplyOps::extractOperationsTo) to
detect writes to the internal fast-count store, and again via extractSizeCountDeltasForApplyOps to extract size/count deltas. Each pass copies common fields into a
fresh BSONObjBuilder per inner op and IDL-parses each result as an OplogEntry.
Constructs a NamespaceString eagerly for every entry, including for op-types that are immediately discarded (noops, container ops, most command types).

The cost is paid per oplog entry on the size/count read path and on every checkpoint, scaling with both oplog throughput and transaction size. Workloads with large prepared transactions or vectored inserts amplify the per-inner-op overhead.

A separate but related helper, aggregateMultiOpSizeMetadata, also lives in this file and shares logic with the scan path. One of its callers operates on in-memory repl::ReplOperation objects that have no BSON form, so any consolidation work needs to keep that case zero-allocation on the prepare hot path.

Note on planned tailable-cursor migration for checkpoint advance:

Checkpoint advance is planned to migrate from a periodic full-tail scan to a tailable cursor (~~SERVER-121018~~) that streams oplog entries as they are written. This optimization remains valuable under that model. The per-entry parsing cost is paid every time an entry is observed regardless of whether the work is batched at checkpoint time or amortized
continuously, and reducing it lowers steady-state CPU and allocation pressure for every consumer of this code.

Acceptance criteria:

Public interface of aggregateSizeCountDeltasInOplog is preserved.
No behavior change in the deltas it produces, the timestamps it advances, or the metrics it emits.
Measurable reduction in per-entry CPU time on a representative oplog workload (CRUD-heavy and transaction-heavy mixes).

is related to

SERVER-121018 Create tailable oplog cursor for replicated fast count

Closed

There are no Sub-Tasks for this issue.

Assignee:: Ernesto Rodriguez Reina
Reporter:: Ernesto Rodriguez Reina
Participants:: Ernesto Rodriguez Reina
Votes:: 0 Vote for this issue
Watchers:: 2 Start watching this issue

Created:: Apr 28 2026 03:41:06 PM UTC
Updated:: Jun 17 2026 07:49:17 PM UTC
Resolved:: Jun 17 2026 07:49:17 PM UTC

Details

Description

Note on planned tailable-cursor migration for checkpoint advance:

Acceptance criteria:

Attachments

Issue Links

Sub-Tasks

Activity

People

Dates