-
Type:
Improvement
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Replication
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Summary
_populateUnsetWriteConcernOptionsSyncMode is called inside _mutex in awaitReplication but only reads _rsConfig.getWriteConcernMajorityShouldJournal(), which is protected by its own WriteRarelyRWMutex. Moving this call before the lock_guard shrinks the critical section to only _startWaitingForReplication.
Background
In replication_coordinator_impl.cpp, awaitReplication acquires _mutex and then calls _populateUnsetWriteConcernOptionsSyncMode before _startWaitingForReplication. _populateUnsetWriteConcernOptionsSyncMode only reads getWriteConcernMajorityShouldJournal(), a seemingly near-immutable property that changes only on replSetReconfig. This value is already independently protected by a WriteRarelyRWMutex, and the existing _getReplSetConfig() helper demonstrates the correct lockless access pattern via a thread-local snapshot.
There is no time-of-check-to-time-of-use hazard because _checkIfWriteConcernCanBeSatisfied re-validates under the lock inside _startWaitingForReplication.
It looks like the same issue exists in awaitReplicationAsyncNoWTimeout.
Proposed fix
Move the _populateUnsetWriteConcernOptionsSyncMode call (or an equivalent read of getWriteConcernMajorityShouldJournal()) before the stdx::lock_guard using _getReplSetConfig(). The critical section then contains only _startWaitingForReplication.
Apply the same change to awaitReplicationAsyncNoWTimeout.
Expected impact
Reduces _mutex hold time on every write-concern wait. This directly reduces contention-induced off-CPU time for optime advancement callbacks and other hot-path operations competing for the same lock.
Acceptance criteria
- _populateUnsetWriteConcernOptionsSyncMode no longer called under _mutex in awaitReplication or awaitReplicationAsyncNoWTimeout
- Existing replication correctness tests pass
- No regression on write latency tail percentiles