[SERVER-44513] Implement TopologyCoordinator::awaitTopologyChange Created: 08/Nov/19  Updated: 29/Oct/23  Resolved: 05/Dec/19

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.3.1

Type: New Feature Priority: Major - P3
Reporter: A. Jesse Jiryu Davis Assignee: Jason Chan
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-44514 TopologyCoordinator::awaitTopologyCha... Closed
is depended on by SERVER-44518 Add topologyVersion to State Change E... Closed
Backwards Compatibility: Fully Compatible
Sprint: Repl 2019-11-18, Repl 2019-12-02, Repl 2019-12-16
Participants:

 Description   

Implement TopologyCoordinator::awaitTopologyChange. For this first ticket, only declare TopologyVersion struct, the TopologyCoordinator::topologyVersion field, and the awaitTopologyChange method, but don't react to actual topology changes, just wait for maxAwaitTimeMS.



 Comments   
Comment by Githook User [ 05/Dec/19 ]

Author:

{'name': 'Jason Chan', 'username': 'jasonjhchan', 'email': 'jason.chan@mongodb.com'}

Message: SERVER-44513 Implement ReplicationCoordinatorImpl::awaitIsMasterResponse
Branch: master
https://github.com/mongodb/mongo/commit/7c7c68478da22dab594d17cf3a18fb4635fe60be

Comment by Tess Avitabile (Inactive) [ 14/Nov/19 ]

jesse, I'm considering take a different approach for the implementation than the one described in the ticket, and I want to hear your thoughts.

I don't think we should share code with ReplicationCoordinatorImpl::_startWaitingForReplication() because the use cases are different. In _startWaitingForReplication(), we have a list of waiters ordered by OpTime, and we must notify waiters up to a certain OpTime. For topology version, all the waiters will get notified at once. This is because an isMaster command will either have a stale topology version (in which case we respond immediately), a future topology version (which is an error), or the current topology version, in which case we wait for the next topology change. Because all waiters will get notified at once, ldeng and I discussed that they could all wait on a single shared future (or perhaps one future per horizon, but more on that later).

I've also been thinking about SERVER-44514, where we want to construct the isMaster response for all the waiters at once. When the topology changes, we will want to take ReplicationCoordinatorImpl::_mutex only once, update the topology version, then construct an isMaster response for all waiters. If awaitTopologyChange() just returns a topology version, each waiter will need to then access this shared response somehow. For this reason, I think the method that does the waiting should return an isMaster response.

I would like to propose a different implementation as follows:

shared_ptr<const IsMasterResponse> ReplicationCoordinatorImpl::getIsMasterResponse(const SplitHorizon::Parameters& horizonParams, TopologyVersion previous, Milliseconds maxAwaitTime) {
    // If previous < current topology version, construct and return isMaster response.
 
    // If previous > current topology version, throw.
 
    // Wait with timeout maxAwaitTime on the future corresponding to horizonParams.
    // If this returns an isMaster response, return it. 
    // If we time out, construct and return an isMaster response.
}

For this ticket, we never fulfill the promises, so waiting would always time out. How does this approach sound to you?

EDIT: We agreed on the implementation described in this comment.

Comment by Githook User [ 13/Nov/19 ]

Author:

{'username': 'tessavitabile', 'email': 'tess.avitabile@mongodb.com', 'name': 'Tess Avitabile'}

Message: SERVER-44513 Declare TopologyVersion struct and TopologyCoordinator::_topologyVersion
Branch: master
https://github.com/mongodb/mongo/commit/9198f936ff192b80567f5dcea326c303b07d4a2e

Generated at Thu Feb 08 05:06:11 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.