[SERVER-46893] Allow streamable isMaster to wait on removed/uninitialized nodes Created: 16/Mar/20 Updated: 29/Oct/23 Resolved: 15/Apr/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | 4.4.0-rc3, 4.7.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Jason Chan | Assignee: | Jason Chan |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Backport Requested: |
v4.4
|
||||||||||||||||
| Sprint: | Repl 2020-04-06, Repl 2020-04-20 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
Currently, if a client has an exhaust isMaster connection with a replica member and the member becomes removed, the server will keep sending responses to the client. A response will be generated right away instead of waiting. After some discussion, we decided on the following solution: Removed or uninitialized nodes will respect the TopologyVersion passed in by the isMaster request:
On a reconfig that adds a node back into the replica set from REMOVED, the node will get a SplitHorizon Error with ok: 0 and the server will disconnect from the client. Splitting this into two tickets. This ticket will track the work to allow waiting on removed/uninitialized nodes. In the case of a non-awaitable isMaster (no TopologyVersion in the request) but the server does not yet have an initialized config or is in the REMOVED state, we will return a response with the following:
This will be the same as the old protocol. |
| Comments |
| Comment by Githook User [ 23/Apr/20 ] | |||||||
|
Author: {'name': 'Jason Chan', 'email': 'jason.chan@10gen.com', 'username': 'jasonjhchan'}Message: (cherry picked from commit 2bad13a63315132a2793194d8d89f28dd7534928)
(cherry picked from commit 523b9c9f92db20062ad6e3f42ceb80292e1a23f3) | |||||||
| Comment by Githook User [ 15/Apr/20 ] | |||||||
|
Author: {'name': 'Jason Chan', 'email': 'jason.chan@10gen.com', 'username': 'jasonjhchan'}Message: | |||||||
| Comment by Tess Avitabile (Inactive) [ 10/Apr/20 ] | |||||||
|
Thanks, jason.chan! Can you please update the ticket description with this plan? | |||||||
| Comment by Jason Chan [ 07/Apr/20 ] | |||||||
|
After some discussion, we decided on the following solution: Removed or uninitialized nodes will respect the TopologyVersion passed in by the isMaster request:
On a replica set reconfig, the server will close connections from the server side if the horizon mappings for the server have changed. If there was no change to the horizon mappings, we will return ok: 1 and reply with an updated isMaster response. On a replSetInitiate, nodes will close connections from the server side if the requested horizon does not exist in this new config. Otherwise, return the updated isMaster request with ok: 1. Splitting this into two tickets. This ticket will track the work to allow waiting on removed/uninitialized nodes. | |||||||
| Comment by Shane Harvey [ 01/Apr/20 ] | |||||||
|
I am also in favor of Proposal 1. I don't think drivers should pick and choose when to using the streaming protocol based on the connection type as Proposal 2 suggests. I also opened DRIVERS-984 to add driver testing for this scenario. I'm not aware of any driver that tests connecting directly to an uninitialized replica set member. | |||||||
| Comment by Tess Avitabile (Inactive) [ 01/Apr/20 ] | |||||||
|
I still feel pretty on the fence, and I'm interested in including others in the decision. I've asked Jason to turn your discussion notes into a short document that we can solicit comments on. Does that sound alright to you, jesse? | |||||||
| Comment by A. Jesse Jiryu Davis [ 01/Apr/20 ] | |||||||
|
I think we should do Proposal 1. Arguments for it: 1. We'll change two implementations (mongod and mongos) instead of a dozen (the drivers). My opinion right now is still based mostly on intuition instead of reason. | |||||||
| Comment by A. Jesse Jiryu Davis [ 01/Apr/20 ] | |||||||
|
Answers to Jason: Right, an uninitialized node should also respond with ok: 1, ismaster: false, secondary: false, and topologyVersion, every maxAwaitTimeMS / topologyVersion change. > if a client sends an awaitable isMaster to a node (maybe for the first time) that is already REMOVED, we will wait at most maxAwaitTimeMS before the server responds with ok: 1, ismaster: false, secondary: false. Is that acceptable from a driver's perspective? You might be misunderstanding my proposal. I think that a removed/uninit'ed node should implement the streaming isMaster protocol the exact same way as all other nodes. If a client sends an awaitable isMaster to a node that is already REMOVED, then that node replies immediately, the same as any other node does, as we've already specified for the streaming isMaster protocol. Then the node waits until maxAwaitTimeMS or a topologyVersion change, and replies again, and so on. | |||||||
| Comment by Jason Chan [ 01/Apr/20 ] | |||||||
|
jessetess.avitabile
I believe this also applies to nodes with uninitialized config, correct? Also, if a client sends an awaitable isMaster to a node (maybe for the first time) that is already REMOVED, we will wait at most maxAwaitTimeMS before the server responds with ok: 1, ismaster: false, secondary: false. Is that acceptable from a driver's perspective?
I think proposal 2 is much simpler to reason about and will result in server code that is easier to follow, but I vote for proposal 1 if we are able to imagine wanting direct connections to start caring about topology changes in the future. | |||||||
| Comment by A. Jesse Jiryu Davis [ 31/Mar/20 ] | |||||||
|
2020-03-31 conversation among Tess, Jason, and me: Solution requirements: 1. Clients must be able to do direct connections to removed/uninit'ed nodes. (Direct connections have no horizon param) Proposal 1: Server logic:
Satisfies requirements: Pros: Future-proof for cases where direct connections might start caring about instant notification of topologyVersion changes. Proposal 2: Client omits topologyVersion, maxAwaitTimeMS from direct connection conversations. Never initiates streaming on direct conn. IsMaster always returns ok: 0 if the request has topologyVersion and this member is removed/uninit'd. Returns ok: 1 even from removed/uninit'd members if request has no topologyVersion/maxAwaitTimeMS. This permits direct connections, but also triggers reboot for RS connections if horizon might've changed. Pros: Less server work, close to Jason's current patch. | |||||||
| Comment by A. Jesse Jiryu Davis [ 31/Mar/20 ] | |||||||
|
tess.avitabile this sounds like a problem. Drivers must be able to directly connect to uninitialized RS members. Quoth the Server Discovery and Monitoring Spec:
This is possible because in the old isMaster protocol, an uninitialized member replies:
The driver recognizes the server as RSGhost. It's permitted to run commands on it if the driver is directly connected to it (no "replicaset=" in the MongoDB URI). So our goal must be to permit drivers to make direct connections to uninitialized or removed members using the new protocol. A possible solution?: An uninitialized or removed member replies the same as above, even when using the streaming protocol. The reply includes ok: 1, ismaster: false, secondary: false, etc., and it has no error code. The server ensures it sends this reply no more frequently than necessary (after the initial handshake, or a topologyVersion change, or maxAwaitTimeMS). The client will consider the member RSGhost as soon as it processes this reply, and it will not select this member when the client is in RS mode (its URI includes "replicaset="), but the client will be able to use it for a direct connection. We've had to change our minds so often that I'm not sure if we've already considered this idea and found a fatal flaw..... shane.harvey what's our testing status? Do we already have tests for some drivers that attempt a direct connection to an uninitialized or removed member with the new protocol? | |||||||
| Comment by Tess Avitabile (Inactive) [ 31/Mar/20 ] | |||||||
|
Sorry for so much back and forth on this, but I'm having a small concern about whether this will affect the ability to run replSetInitiate through a driver that uses streamable isMaster. I'm worried about the following case: Driver sends isMaster w/o topologyVersion Server responds w/ topologyVersion Driver sends isMaster w/ topologyVersion Server responds w/ error because topologyVersion is requested and config is not yet initialized Driver closes monitoring connection Driver tries to run replSetInitiate Do you expect this will be a problem? Do we usually run replSetInitiate through the drivers, or do drivers connect to sets that are already initiated? CC jesse | |||||||
| Comment by Shane Harvey [ 30/Mar/20 ] | |||||||
|
SGTM. Both of those responses will cause the driver to mark the server as Unknown. Note that the ok:0 error has the additional effect of causing the driver to close the monitoring connection which sounds fine to me. | |||||||
| Comment by Jason Chan [ 30/Mar/20 ] | |||||||
|
shane.harvey, this ticket was updated to describe some slight additional changes. In addition to the SplitHorizonError we described earlier for servers in the REMOVED state, awaitable isMasters on servers with configs that have not yet been initialized will return a NotYetInitialized error with ok: 0. For non-awaitable isMasters (no TopologyVersion in the isMaster request) on servers that do not have a valid replica set config (either not yet initialized or member is REMOVED), we will follow the old protocol and respond with "Does not have a valid replica set config" in the "info" field. Let us know if you see any problems with this. | |||||||
| Comment by Tess Avitabile (Inactive) [ 30/Mar/20 ] | |||||||
|
My hope is that that isn't necessary. As part of this fix, we audited the code and saw that after this change, the server will always either return ok:0 or wait maxAwaitTimeMS for a topologyVersion change. | |||||||
| Comment by Shane Harvey [ 27/Mar/20 ] | |||||||
|
Thanks, that answers it. In general, is it possible to add a safe-guard that avoids these kinds of bugs in the future? Perhaps the server could rate limit the isMaster stream in some way? For example, the server could use a backoff that starts at 0 seconds for X number of consecutive responses and then increases to maxAwaitTimeMS. The backoff could be triggered when too many responses are sent in some small time frame. | |||||||
| Comment by Jason Chan [ 27/Mar/20 ] | |||||||
|
shane.harvey I updated the ticket description. Let me know if this answers your question. CC: tess.avitabile | |||||||
| Comment by Shane Harvey [ 27/Mar/20 ] | |||||||
|
Can you clarify what the isMaster response is in this case? Is it ok:0 or ok:1? I ask because in the driver's spec we decided that a client should close the monitoring connection after receiving an ok:0 streaming isMaster response (and wait heartbeatFrequencyMS before running a new isMaster on a new connection). CC: matt.broadstone. |