-
Type:
Improvement
-
Resolution: Unresolved
-
Priority:
Major - P3
-
Affects Version/s: None
-
Component/s: Replication
-
None
-
Product Performance
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Problem
Off-CPU profiling of a 128-thread pure-read workload shows 0.19% cumulative off-CPU time (110M µs) in ReplicationCoordinatorImpl::getMyLastAppliedOpTime, caused by contention on the main ReplicationCoordinator _mutex. The call originates from computeOperationTime, which is invoked on every command response to populate the operationTime field. The shadow is written and the authoritative value are updated in the same mutex-held code path, so the maximum staleness is the time between two consecutive lines in _setMyLastAppliedOpTimeAndWallTime — sub-microsecond in practice.
This is part of a broader pattern of per-request ObservableMutex contention totaling ~0.74% of off-CPU time across VectorClock::_advanceTime, Top::record, getMyLastAppliedOpTime, gossipOut, and query_stats::registerRequest.
Solution
Add an AtomicWord<uint64_t> _lastAppliedTimestampShadow field to ReplicationCoordinatorImpl that mirrors the timestamp portion of lastAppliedOpTime. The shadow is updated under _mutex in _setMyLastAppliedOpTimeAndWallTime (the low-frequency write path) and read without the mutex via a new getMyLastAppliedTimestampRelaxed() method using loadRelaxed().
In computeOperationTime (service_entry_point_shard_role.cpp), replace replCoord->getMyLastAppliedOpTime().getTimestamp() with replCoord->getMyLastAppliedTimestampRelaxed(), eliminating the mutex acquisition from the per-request response path.
This follows the established _electionIdTermShadow pattern already in ReplicationCoordinatorImpl, which shadows the election term with AtomicWord<long long> for the same reason. A stale read returns a timestamp that is <= the authoritative value. Since operationTime is used by drivers for causal consistency (afterClusterTime), a slightly older value only affects visibility of other clients' concurrent writes, which is already non-deterministic due to replication lag. A client's own read-your-own-writes guarantees are unaffected because the driver tracks session writes separately.
- is related to
-
SERVER-113363 Make ReplicationCoordinatorImpl public accessors lock-free
-
- Closed
-