Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- repl-shortlist

Assigned Teams:

Replication
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

As the number of operations waiting for replication waiters increases, the time required to wake all waiters increases linearly. This creates the following positive feedback loop, which results in lower overall throughput as the arrival rate increases:

A secondary sends the updatePosition command to the primary. On the primary, the updatePosition command advances the stable timestamp.
- Holds the ReplicationCoordinator mutex
- Traverses eligible waiters to wake them up
Meanwhile, new writes cannot be replicated because the ReplicationCoordinator mutex must be acquired to make new writes visible to secondaries
- This temporarily prevents new data from being replicated, which results in a larger batches of data being replicated.
- These larger batches take longer to process, which this increases replication wait latencies and the number of waiters.

I described the effects of a similar feedback loop here, but this one is more impactful because the latency of waking up N operations is significantly higher than the latency of the others, because it requires N syscalls and can quickly cap throughput.

The solution here likely requires decoupling the waiting and waking from the advancement of replication timestamps.

duplicates

SERVER-116524 Mitigate the effects of a long repl waiter list for eligible majority writes

Open

is related to

SERVER-84449 High WiredTiger session concurrency can increase replication write latency

Open

Assignee:: Unassigned
Reporter:: Louis Williams
Participants:: Louis Williams
Votes:: 0 Vote for this issue
Watchers:: 19 Start watching this issue

Created:: Nov 05 2025 07:59:04 PM UTC
Updated:: Mar 10 2026 08:31:22 PM UTC
Resolved:: Mar 10 2026 08:31:17 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates