[SERVER-46893] Allow streamable isMaster to wait on removed/uninitialized nodes Created: 16/Mar/20  Updated: 29/Oct/23  Resolved: 15/Apr/20

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.4.0-rc3, 4.7.0

Type: Bug Priority: Major - P3
Reporter: Jason Chan Assignee: Jason Chan
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
related to SERVER-47394 Have servers close connections on a S... Closed
is related to SERVER-47557 Write unit test for interaction of qu... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4
Sprint: Repl 2020-04-06, Repl 2020-04-20
Participants:

 Description   

Currently, if a client has an exhaust isMaster connection with a replica member and the member becomes removed, the server will keep sending responses to the client. A response will be generated right away instead of waiting.

After some discussion, we decided on the following solution:

Removed or uninitialized nodes will respect the TopologyVersion passed in by the isMaster request:

  • Requests with a stale TopologyVersion counter or a different processId will return immediately with an InvalidConfig response.
  • Requests with a TopologyVersion counter equal to the server TopologyVersion counter will wait up to maxAwaitTimeMS for a topology change. If the request times out, return an InvalidConfig response
  • Requests that send a TopologyVersion counter greater than the server TopologyVersion counter will return an error with ok: 0.
    On a replica set reconfig, the server will close connections from the server side if the horizon mappings for the server have changed. If there was no change to the horizon mappings, we will return ok: 1 and reply with an updated isMaster response.

On a reconfig that adds a node back into the replica set from REMOVED, the node will get a SplitHorizon Error with ok: 0 and the server will disconnect from the client.
On a replSetInitiate, nodes will close connections from the server side if the requested horizon does not exist in this new config. Otherwise, return the updated isMaster request with ok: 1.

Splitting this into two tickets. This ticket will track the work to allow waiting on removed/uninitialized nodes. SERVER-47394 will track the work to add a server-side disconnect.

In the case of a non-awaitable isMaster (no TopologyVersion in the request) but the server does not yet have an initialized config or is in the REMOVED state, we will return a response with the following:

"ismaster" : false,
"secondary" : false,
"info" : "Does not have a valid replica set config",

This will be the same as the old protocol.



 Comments   
Comment by Githook User [ 23/Apr/20 ]

Author:

{'name': 'Jason Chan', 'email': 'jason.chan@10gen.com', 'username': 'jasonjhchan'}

Message: SERVER-46893 Allow streamable isMaster to wait on removed/uninitialized nodes

(cherry picked from commit 2bad13a63315132a2793194d8d89f28dd7534928)

SERVER-47638 Ensure isMaster is waiting before calling replSetInitiate in AwaitableIsMasterOnNodeWithUninitializedConfigInvalidHorizon

(cherry picked from commit 523b9c9f92db20062ad6e3f42ceb80292e1a23f3)
Branch: v4.4
https://github.com/mongodb/mongo/commit/3064008dd3830bee2c18cea531fff565e26d47e5

Comment by Githook User [ 15/Apr/20 ]

Author:

{'name': 'Jason Chan', 'email': 'jason.chan@10gen.com', 'username': 'jasonjhchan'}

Message: SERVER-46893 Allow streamable isMaster to wait on removed/uninitialized nodes
Branch: master
https://github.com/mongodb/mongo/commit/2bad13a63315132a2793194d8d89f28dd7534928

Comment by Tess Avitabile (Inactive) [ 10/Apr/20 ]

Thanks, jason.chan! Can you please update the ticket description with this plan?

Comment by Jason Chan [ 07/Apr/20 ]

After some discussion, we decided on the following solution:

Removed or uninitialized nodes will respect the TopologyVersion passed in by the isMaster request:

  • Requests with a stale TopologyVersion counter or a different processId will return immediately with an InvalidConfig response.
  • Requests with a TopologyVersion counter equal to the server TopologyVersion counter will wait up to maxAwaitTimeMS for a topology change. If the request times out, return an InvalidConfig response
  • Requests that send a TopologyVersion counter greater than the server TopologyVersion counter will return an error with ok: 0.

On a replica set reconfig, the server will close connections from the server side if the horizon mappings for the server have changed. If there was no change to the horizon mappings, we will return ok: 1 and reply with an updated isMaster response.

On a replSetInitiate, nodes will close connections from the server side if the requested horizon does not exist in this new config. Otherwise, return the updated isMaster request with ok: 1.

Splitting this into two tickets. This ticket will track the work to allow waiting on removed/uninitialized nodes. SERVER-47394 will track the work to replace the SplitHorizonChange error with a server-side disconnect.

Comment by Shane Harvey [ 01/Apr/20 ]

I am also in favor of Proposal 1. I don't think drivers should pick and choose when to using the streaming protocol based on the connection type as Proposal 2 suggests. I also opened DRIVERS-984 to add driver testing for this scenario. I'm not aware of any driver that tests connecting directly to an uninitialized replica set member.

Comment by Tess Avitabile (Inactive) [ 01/Apr/20 ]

I still feel pretty on the fence, and I'm interested in including others in the decision. I've asked Jason to turn your discussion notes into a short document that we can solicit comments on. Does that sound alright to you, jesse?

Comment by A. Jesse Jiryu Davis [ 01/Apr/20 ]

I think we should do Proposal 1. Arguments for it:

1. We'll change two implementations (mongod and mongos) instead of a dozen (the drivers).
2. More of the protocol's logic remains in the server with this proposal, which makes server-side improvements easier in the future.
3. I feel that Proposal 2 is hard to explain, because it mixes two layers we normally consider separately: the protocol layer between a client and one server, and the higher layer where a client connects to a standalone, replica set, or sharded cluster.

My opinion right now is still based mostly on intuition instead of reason.

Comment by A. Jesse Jiryu Davis [ 01/Apr/20 ]

Answers to Jason: Right, an uninitialized node should also respond with ok: 1, ismaster: false, secondary: false, and topologyVersion, every maxAwaitTimeMS / topologyVersion change.

> if a client sends an awaitable isMaster to a node (maybe for the first time) that is already REMOVED, we will wait at most maxAwaitTimeMS before the server responds with ok: 1, ismaster: false, secondary: false. Is that acceptable from a driver's perspective?

You might be misunderstanding my proposal. I think that a removed/uninit'ed node should implement the streaming isMaster protocol the exact same way as all other nodes. If a client sends an awaitable isMaster to a node that is already REMOVED, then that node replies immediately, the same as any other node does, as we've already specified for the streaming isMaster protocol. Then the node waits until maxAwaitTimeMS or a topologyVersion change, and replies again, and so on.

Comment by Jason Chan [ 01/Apr/20 ]

jessetess.avitabile
For my own clarification:

After reconfig that removes this node, respond with ok: 1, ismaster: false, secondary: false, and topologyVersion. Do this every maxAwaitTimeMS / topologyVersion change. The streaming protocol continues.

I believe this also applies to nodes with uninitialized config, correct? Also, if a client sends an awaitable isMaster to a node (maybe for the first time) that is already REMOVED, we will wait at most maxAwaitTimeMS before the server responds with ok: 1, ismaster: false, secondary: false. Is that acceptable from a driver's perspective?

Pros: Future-proof for cases where direct connections might start caring about instant notification of topologyVersion changes.

I think proposal 2 is much simpler to reason about and will result in server code that is easier to follow, but I vote for proposal 1 if we are able to imagine wanting direct connections to start caring about topology changes in the future.

Comment by A. Jesse Jiryu Davis [ 31/Mar/20 ]

2020-03-31 conversation among Tess, Jason, and me:

Solution requirements:

1. Clients must be able to do direct connections to removed/uninit'ed nodes. (Direct connections have no horizon param)
2. After a horizon change, a client using an RS connection with a horizon parameter must reboot the ismaster conversation
3. Don't bombard client with replies under any circumstances

Proposal 1:

Server logic:

  • After reconfig that removes this node, respond with ok: 1, ismaster: false, secondary: false, and topologyVersion. Do this every maxAwaitTimeMS / topologyVersion change. The streaming protocol continues.
  • After a reconfig or initiate, if the client's last isMaster request included horizon parameter, respond ok: 0, SplitHorizonError, no MoreToCome. Server now waits for a new request on this connection, or for the client to disconnect. When the client receives ok: 0 it disconnects, then sends new a isMaster request on a new connection; the server responds normally to it.
  • After a reconfig or initiate, if the client's last isMaster request did not include a horizon parameter, send a reply normally with default horizon.

Satisfies requirements:
1. Allows direct connections to removed/uninit'd, because such a member responds ok: 1.
2. Reboots RS connections after horizon changes, because any reconfig that might change horizons triggers ONE ok: 0 response.
3. We never bombard.

Pros: Future-proof for cases where direct connections might start caring about instant notification of topologyVersion changes.
Cons: More server work than Proposal 2.

Proposal 2:

Client omits topologyVersion, maxAwaitTimeMS from direct connection conversations. Never initiates streaming on direct conn.

IsMaster always returns ok: 0 if the request has topologyVersion and this member is removed/uninit'd. Returns ok: 1 even from removed/uninit'd members if request has no topologyVersion/maxAwaitTimeMS.

This permits direct connections, but also triggers reboot for RS connections if horizon might've changed.

Pros: Less server work, close to Jason's current patch.
Cons: More client logic, requires changes from current client implementations.

Comment by A. Jesse Jiryu Davis [ 31/Mar/20 ]

tess.avitabile this sounds like a problem. Drivers must be able to directly connect to uninitialized RS members. Quoth the Server Discovery and Monitoring Spec:

General Requirements
Direct connections: A client MUST be able to connect to a single server of any type. This includes querying hidden replica set members, and connecting to uninitialized members (see RSGhost) in order to run "replSetInitiate".

This is possible because in the old isMaster protocol, an uninitialized member replies:

{
  ok: 1,
  ismaster: false,
  secondary: false,
  isreplicaset: true,
  ... other fields ...
}

The driver recognizes the server as RSGhost. It's permitted to run commands on it if the driver is directly connected to it (no "replicaset=" in the MongoDB URI).

So our goal must be to permit drivers to make direct connections to uninitialized or removed members using the new protocol.

A possible solution?: An uninitialized or removed member replies the same as above, even when using the streaming protocol. The reply includes ok: 1, ismaster: false, secondary: false, etc., and it has no error code. The server ensures it sends this reply no more frequently than necessary (after the initial handshake, or a topologyVersion change, or maxAwaitTimeMS). The client will consider the member RSGhost as soon as it processes this reply, and it will not select this member when the client is in RS mode (its URI includes "replicaset="), but the client will be able to use it for a direct connection.

We've had to change our minds so often that I'm not sure if we've already considered this idea and found a fatal flaw.....

shane.harvey what's our testing status? Do we already have tests for some drivers that attempt a direct connection to an uninitialized or removed member with the new protocol?

Comment by Tess Avitabile (Inactive) [ 31/Mar/20 ]

Sorry for so much back and forth on this, but I'm having a small concern about whether this will affect the ability to run replSetInitiate through a driver that uses streamable isMaster. I'm worried about the following case:

Driver sends isMaster w/o topologyVersion

Server responds w/ topologyVersion

Driver sends isMaster w/ topologyVersion

Server responds w/ error because topologyVersion is requested and config is not yet initialized

Driver closes monitoring connection

Driver tries to run replSetInitiate

Do you expect this will be a problem? Do we usually run replSetInitiate through the drivers, or do drivers connect to sets that are already initiated?

CC jesse

Comment by Shane Harvey [ 30/Mar/20 ]

SGTM. Both of those responses will cause the driver to mark the server as Unknown. Note that the ok:0 error has the additional effect of causing the driver to close the monitoring connection which sounds fine to me.

Comment by Jason Chan [ 30/Mar/20 ]

shane.harvey, this ticket was updated to describe some slight additional changes.

In addition to the SplitHorizonError we described earlier for servers in the REMOVED state, awaitable isMasters on servers with configs that have not yet been initialized will return a NotYetInitialized error with ok: 0.

For non-awaitable isMasters (no TopologyVersion in the isMaster request) on servers that do not have a valid replica set config (either not yet initialized or member is REMOVED), we will follow the old protocol and respond with "Does not have a valid replica set config" in the "info" field.

Let us know if you see any problems with this.

Comment by Tess Avitabile (Inactive) [ 30/Mar/20 ]

My hope is that that isn't necessary. As part of this fix, we audited the code and saw that after this change, the server will always either return ok:0 or wait maxAwaitTimeMS for a topologyVersion change.

Comment by Shane Harvey [ 27/Mar/20 ]

Thanks, that answers it. In general, is it possible to add a safe-guard that avoids these kinds of bugs in the future? Perhaps the server could rate limit the isMaster stream in some way? For example, the server could use a backoff that starts at 0 seconds for X number of consecutive responses and then increases to maxAwaitTimeMS. The backoff could be triggered when too many responses are sent in some small time frame.

Comment by Jason Chan [ 27/Mar/20 ]

shane.harvey I updated the ticket description. Let me know if this answers your question.

CC: tess.avitabile

Comment by Shane Harvey [ 27/Mar/20 ]

Can you clarify what the isMaster response is in this case? Is it ok:0 or ok:1? I ask because in the driver's spec we decided that a client should close the monitoring connection after receiving an ok:0 streaming isMaster response (and wait heartbeatFrequencyMS before running a new isMaster on a new connection).

CC: matt.broadstone.

Generated at Thu Feb 08 05:12:44 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.