[SERVER-16952] Optimize oplog reader batching Created: 20/Jan/15  Updated: 06/Dec/22  Resolved: 09/Mar/17

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Scott Hernandez (Inactive) Assignee: Backlog - Replication Team
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Replication
Participants:

 Description   

It is likely that batches are smaller since the primary can write with higher concurrency and lower latency causing replicating member to return smaller batches of documents during replication.

If this is true this will lead to lower latency in replication "eventual consistency" but increased network traffic and larger load, both in cpu and contention, on both members.

There are other reasons that the batches may be smaller including:

  • Change in lock acquisition (fairness)
  • Yielding and interleave behavior
  • Replication changes during re-write

In order to better analyze this behavior we should be able to look at the replication network metrics (metrics.repl.network.getmores/ops) which record the reader side of replication.


Generated at Thu Feb 08 03:42:48 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.