-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Replication
-
None
-
Fully Compatible
-
ALL
-
v4.4
-
Repl 2020-04-06, Repl 2020-04-20
Currently, if a client has an exhaust isMaster connection with a replica member and the member becomes removed, the server will keep sending responses to the client. A response will be generated right away instead of waiting.
After some discussion, we decided on the following solution:
Removed or uninitialized nodes will respect the TopologyVersion passed in by the isMaster request:
- Requests with a stale TopologyVersion counter or a different processId will return immediately with an InvalidConfig response.
- Requests with a TopologyVersion counter equal to the server TopologyVersion counter will wait up to maxAwaitTimeMS for a topology change. If the request times out, return an InvalidConfig response
- Requests that send a TopologyVersion counter greater than the server TopologyVersion counter will return an error with ok: 0.
On a replica set reconfig, the server will close connections from the server side if the horizon mappings for the server have changed. If there was no change to the horizon mappings, we will return ok: 1 and reply with an updated isMaster response.
On a reconfig that adds a node back into the replica set from REMOVED, the node will get a SplitHorizon Error with ok: 0 and the server will disconnect from the client.
On a replSetInitiate, nodes will close connections from the server side if the requested horizon does not exist in this new config. Otherwise, return the updated isMaster request with ok: 1.
Splitting this into two tickets. This ticket will track the work to allow waiting on removed/uninitialized nodes. SERVER-47394 will track the work to add a server-side disconnect.
In the case of a non-awaitable isMaster (no TopologyVersion in the request) but the server does not yet have an initialized config or is in the REMOVED state, we will return a response with the following:
"ismaster" : false, "secondary" : false, "info" : "Does not have a valid replica set config",
This will be the same as the old protocol.
- is related to
-
SERVER-47557 Write unit test for interaction of quiesce mode and REMOVED state
- Closed
- related to
-
SERVER-47394 Have servers close connections on a SplitHorizonChange error for awaitable isMaster
- Closed