[SERVER-83182] Avoid the delay in reporting self opTimes due to the constraint of at most one in-flight updatePosition request Created: 13/Nov/23 Updated: 05/Feb/24 |
|
| Status: | In Code Review |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Wenbin Zhu | Assignee: | Jiawei Yang |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | PM-3489-Milestone-MiscImprovement-CP, PM-3489-perf-testing-required | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Replication
|
||||||||
| Sprint: | Repl 2024-01-08, Repl 2024-01-22, Repl 2024-02-05, Repl 2024-02-19 | ||||||||
| Participants: | |||||||||
| Description |
|
Replication only allows one in-flight replSetUpdatePosition request to be sent to the upstream at a time. This is initially because when chaining is enabled, we want to avoid a flood of forwarded replSetUpdatePosition messages. However this means that we could necessarily delay the reporting of our own opTime changes. For example, on secondaries after applying each batch, we update the lastApplied opTime and asynchronously flush the journal and update the lastDurable opTime which both could trigger an replSetUpdatePosition request to the upstream. So it's possible that sometimes the one triggered by the change of lastDurable is blocked behind the the one triggered by the change of lastApplied, causing extra latency in {j: true} majority commit acknowledgement. We could relax the constraint when the replSetUpdatePosition is triggered by the node itself instead of the forwarded. This may still not be perfect when nodes are chained, but could be sufficient for short term. We should perf test the change and see if it results in improvement. |
| Comments |
| Comment by Wenbin Zhu [ 12/Dec/23 ] |
|
Another idea from mathias@mongodb.com: "we could also possibly delay sending updates that only change applied but not durable, so that when durable advances we can send an update immediately". |
| Comment by Wenbin Zhu [ 13/Nov/23 ] |
|
We'd like to first do a quick POC first with perf tests and see whether there is noticeable improvement. |