[SERVER-49534] Remove ReplicaSetAwareService::onStepUpBegin Created: 15/Jul/20  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Spencer Brody (Inactive) Assignee: Backlog - Replication Team
Resolution: Unresolved Votes: 0
Labels: neweng
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-49533 Remove blocking work from Balancer's ... Closed
Related
related to SERVER-49532 Remove OperationContext argument from... Backlog
Assigned Teams:
Replication
Participants:

 Description   

External services shouldn't have to know about the different phases of stepUp. Rather than having both onStepUpBegin and onStepUpComplete, the ReplicaSetAwareService interface should expose a single onStepUp method, which is the equivalent of the current onStepUpComplete.



 Comments   
Comment by Kaloian Manassiev [ 05/Aug/20 ]

siyuan.zhou, this is correct.

Comment by Siyuan Zhou [ 04/Aug/20 ]

Discussed with kaloian.manassiev and spencer offline. Correct me if I'm wrong kaloian.manassiev, we needed onStepUpBegin() separate from onStepUpComplete() because it joins a thread and would cause deadlock if it ran under RSTL, which is held by onStepUpComplete(). Thus we cannot move balancer's onStepUpBegin to onStepUpComplete. As a result, this depends on Balancer's change in SERVER-49533.

Comment by Spencer Brody (Inactive) [ 28/Jul/20 ]

I think for the purposes of this ticket it would probably be fine just to move the body of the balancer's onStepUpBegin to be the very beginning of onStepUpComplete

Comment by Kaloian Manassiev [ 28/Jul/20 ]

With primary-only service, we should be able to stop old balancers in a different approach

Just to be clear, it is not the stopping of the old balancers, but the joining so that we provide some kind of "single-threaded" guarantee to the implementors of ReplicaSetAwareService(s). I want as an implementor of such service that I have clear points where I can join my threads instead of having to add internal synchronisation, such as terms, etc. If this can be provided, then sounds good to me, otherwise we are just pushing more work to the service itself to compensate for something, which has never been a problem (namely the joining of the threads).

Comment by Siyuan Zhou [ 28/Jul/20 ]

This is a great idea. The difference between onStepUpBegin and onStepUpComplete is subtle. Balancer::onStepUpBegin() only waits for the old balancer to finish. With primary-only service, we should be able to stop old balancers in a different approach. It's very valuable to minimize the contract between replication and the rest of the system.

I think this doesn't necessarily depend on SERVER-49533 since SERVER-49533 involves a deeper discussion of disk I/O and synchronization between stepup and primary-only service. This ticket is aligned well with the current design of primary-only service. kaloian.manassiev, does this make sense to you?

Generated at Thu Feb 08 05:20:10 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.