Move _populateUnsetWriteConcernOptionsSyncMode outside _mutex in awaitReplication

XMLWordPrintableJSON

    • Type: Improvement
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Replication
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Summary

      _populateUnsetWriteConcernOptionsSyncMode is called inside _mutex in awaitReplication but only reads _rsConfig.getWriteConcernMajorityShouldJournal(), which is protected by its own WriteRarelyRWMutex. Moving this call before the lock_guard shrinks the critical section to only _startWaitingForReplication.

      Background

      In replication_coordinator_impl.cpp, awaitReplication acquires _mutex and then calls _populateUnsetWriteConcernOptionsSyncMode before _startWaitingForReplication. _populateUnsetWriteConcernOptionsSyncMode only reads getWriteConcernMajorityShouldJournal(), a seemingly near-immutable property that changes only on replSetReconfig. This value is already independently protected by a WriteRarelyRWMutex, and the existing _getReplSetConfig() helper demonstrates the correct lockless access pattern via a thread-local snapshot.

      There is no time-of-check-to-time-of-use hazard because _checkIfWriteConcernCanBeSatisfied re-validates under the lock inside _startWaitingForReplication.

      It looks like the same issue exists in awaitReplicationAsyncNoWTimeout.

      Proposed fix

      Move the _populateUnsetWriteConcernOptionsSyncMode call (or an equivalent read of getWriteConcernMajorityShouldJournal()) before the stdx::lock_guard using _getReplSetConfig(). The critical section then contains only _startWaitingForReplication.

      Apply the same change to awaitReplicationAsyncNoWTimeout.

      Expected impact

      Reduces _mutex hold time on every write-concern wait. This directly reduces contention-induced off-CPU time for optime advancement callbacks and other hot-path operations competing for the same lock.

      Acceptance criteria

      • _populateUnsetWriteConcernOptionsSyncMode no longer called under _mutex in awaitReplication or awaitReplicationAsyncNoWTimeout
      • Existing replication correctness tests pass
      • No regression on write latency tail percentiles

            Assignee:
            Unassigned
            Reporter:
            Ger Hartnett
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: