Replication scales poorly with number of waiters

    • Type: Improvement
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Replication
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      As the number of operations waiting for replication waiters increases, the time required to wake all waiters increases linearly. This creates the following positive feedback loop, which results in lower overall throughput as the arrival rate increases:

      • A secondary sends the updatePosition command to the primary. On the primary, the updatePosition command advances the stable timestamp.
        • Holds the ReplicationCoordinator mutex
        • Traverses eligible waiters to wake them up
      • Meanwhile, new writes cannot be replicated because the ReplicationCoordinator mutex must be acquired to make new writes visible to secondaries
        • This temporarily prevents new data from being replicated, which results in a larger batches of data being replicated.
        • These larger batches take longer to process, which this increases replication wait latencies and the number of waiters.

      I described the effects of a similar feedback loop here, but this one is more impactful because the latency of waking up N operations is significantly higher than the latency of the others, because it requires N syscalls and can quickly cap throughput.

      The solution here likely requires decoupling the waiting and waking from the advancement of replication timestamps.

            Assignee:
            Unassigned
            Reporter:
            Louis Williams
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated: