[SERVER-44514] TopologyCoordinator::awaitTopologyChanges returns early on primary topology changes Created: 08/Nov/19  Updated: 29/Oct/23  Resolved: 18/Dec/19

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.3.3

Type: New Feature Priority: Major - P3
Reporter: A. Jesse Jiryu Davis Assignee: Jason Chan
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-44513 Implement TopologyCoordinator::awaitT... Closed
depends on SERVER-44509 Make isMaster wait for up to maxAwait... Closed
is depended on by SERVER-44515 Test that TopologyCoordinator::awaitT... Closed
Backwards Compatibility: Fully Compatible
Sprint: Repl 2019-12-02, Repl 2020-01-13
Participants:

 Description   

React to state changes when the server is primary prior to state change. Arrange for the isMaster response for all waiters to be constructed once.

Update isMaster to call the new ReplicationCoordinatorImpl::getIsMasterResponse.

Write the tests from the design doc for primary state changes.



 Comments   
Comment by Githook User [ 18/Dec/19 ]

Author:

{'name': 'Jason Chan', 'email': 'jason.chan@mongodb.com', 'username': 'jasonjhchan'}

Message: SERVER-44514 awaitIsMasterResponse returns early on topology changes
Branch: master
https://github.com/mongodb/mongo/commit/e83c51c61fa9907459e1cfac9e4bb3517445612f

Comment by Jason Chan [ 10/Dec/19 ]

The work to fulfill promises after upgrading/downgrading FCV will be tracked through SERVER-45045.

Comment by Jason Chan [ 09/Dec/19 ]

Always fulfilling the promises whenever _updateMemberStateFromTopologyCoordinator is called sounds good to me. I agree with Jesse's sentiment that the cost of an extra isMaster is low compared to the cost of a possible bug.

Comment by Tess Avitabile (Inactive) [ 09/Dec/19 ]

Adding promise fulfillment to _updateMemberStateFromTopologyCoordinator sounds good to me. I think it will be insufficient to check that members are in a new state, since the isMaster response also depends on whether the node can accept writes (see here). This makes me think that we should always fulfill promises in this function, to prevent us from forgetting to fulfill promises if we introduce new topology changes in the future. The risk is that we would fulfill promises spuriously. What do you think?

As we discussed, we can file a follow-up ticket for handling reconfigs that change horizon mappings. If it doesn't work to make a decision on slack, I suggest handling this over email.

Comment by Jason Chan [ 09/Dec/19 ]

jessetess.avitabile
After doing some investigating, I propose adding the logic to ReplicationCoordinatorImpl::_updateMemberStateFromTopologyCoordinator.

This helper is called by the following methods:

We should also update the topology version on upgrading and downgrading FCV as this changes the max/min wireversions.

Since a lot of these overlap, I think we can add the promise fulfillment logic to the end of _updateMemberStateFromTopologyCoordinator (after checking that members are in a new state) except for the case of Replica set reconfigs. For replica set reconfigs, I believe we have to update the topology version each time since it is possible that the horizons have been updated.

Question: What should we do for exhaust isMasters after a horizon change? I intended to build a map of waiters per horizon but from what I understand, mappings may change after a replica set reconfig. Should we require clients to re-initiate an exhaust stream after each reconfig or should we implement more logic to deal with these mappings?

Generated at Thu Feb 08 05:06:12 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.