[SERVER-35420] Remove stable optime candidates list from ReplicationCoordinator Created: 05/Jun/18 Updated: 06/Dec/22 Resolved: 22/Jan/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Spencer Brody (Inactive) | Assignee: | Backlog - Replication Team |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Assigned Teams: |
Replication
|
||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Description |
|
Once we are allowed to do PIT reads on secondaries that fall in the middle of previous secondary application batches, we'll be able to get rid of ReplicationCoordinatorImpl::_stableOpTimeCandidates. At that point the stable timestamp can simply be the min of the lastApplied optime on this node and the replication majority commit point. |
| Comments |
| Comment by Tess Avitabile (Inactive) [ 02/Jan/20 ] |
|
I believe we also need this list for sets with enableMajorityReadConcern:false. Once we remove that option, I would be interested in exploring removing the candidates list. I think it would be worth testing the perf impact. |
| Comment by Judah Schvimer [ 02/Jan/20 ] |
|
From conversation with milkie on the initial sync semantics design: I think we still have candidates for replica sets with only one voting node. The replication commit point can advance ahead of the all_durable timestamp, but the stable timestamp must be less than or equal to the all_durable timestamp. all_durable is not guaranteed to be a timestamp in the oplog, but the stable timestamp must be a timestamp in the oplog. The candidate list is a lot of complexity for what I'd consider an edge case. I think we could make sets with a single voting node hold the replication commit point behind the all_durable timestamp and remove the candidates list. This would add a perf penalty in that edge case, but that's almost certainly worth the complexity removal and the perf penalty of maintaining the candidates list. Looking in the codebase I no longer see any places where we use the candidate list outside of choosing the stable timestamp, so I think it is worth considering removing it in the future. |
| Comment by Gregory McKeon (Inactive) [ 22/Jan/19 ] |
|
We now use this list in several places, so we're leaving it. |
| Comment by Judah Schvimer [ 26/Sep/18 ] |
|
This probably has to be the lesser of the "all committed" timestamp and the "majority commit point", but we must be careful because the "all committed" timestamp can be a timestamp that's not in the oplog, which is not a valid "stable timestamp". |