[DRIVERS-2735] Add testing for the streamable monitoring protocol without exhaust support Created: 26/Sep/23  Updated: 20/Nov/23

Status: Backlog
Project: Drivers
Component/s: SDAM
Fix Version/s: None

Type: Spec Change Priority: Unknown
Reporter: Bailey Pearson Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: FY25Q1
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by NODE-5650 Monitors do not handle a `moreToCome:... Blocked
Related
is related to NODE-5650 Monitors do not handle a `moreToCome:... Blocked
Driver Changes: Needed

 Description   

Summary

The SDAM spec outlines the following criteria for how a driver handles streamable hello responses (truncated so only the relevant bullet is shown):

A client follows these rules when processing the hello or legacy hello exhaust response:

  • If the response is successful (includes "ok:1") and does not include the OP_MSG moreToCome flag, then the client initiates a new awaitable hello or legacy hello with the topologyVersion field from the previous response.

There are no tests that ensure drivers correctly re-initiate a new hello if the server responds with `moreToCome: false` when exhaust support is enabled.  Additionally, there no tests at all that test that when exhaust support is disabled, a driver still correctly monitors the server using the streaming protocol.

A recent slack thread highlighted the consequences of incorrectly handling either scenario: the driver may end up with a stale topology that will never be updated1.

There is currently no known scenario where the server might respond to a streamable hello using exhaust_allowed with `moreToCome: false`, so this has not been an issue in practice.  But this scenario could change, and we should add testing for it.

Adding testing for the streaming protocol without exhaust support may also be valuable.  This may not be an issue in practice though, since drivers (like Node) might unconditionally use exhaust support for the streaming protocol.

1 at least until the monitor's connections encounter an error and close, forcing the monitor to create a new monitoring connection and re-initiate the monitoring process.  But until then, the driver has a perpetually stale topology.

Motivation

Who is the affected end user?

Potentially users, depending on server behavior.

How does this affect the end user?

Are they blocked? Are they annoyed? Are they confused?

How likely is it that this problem or use case will occur?

Main path? Edge case?

If the problem does occur, what are the consequences and how severe are they?

If the scenarios this ticket is testing are incorrectly implemented by drivers, this can result in stale topologies that are never updated.

For example, Node does not properly handle the streaming protocol when an a hello with `EXHAUST_ALLOWED` enabled returns `moreToCome: false`.  If this scenario occurred with the Node driver, the driver's SDAM would hang indefinitely.

There is currently no known scenario where this may occur but this could change.

Is this issue urgent?

unknown.

Is this ticket required by a downstream team?

nope.

Is this ticket only for tests?

yes.

Acceptance Criteria

  • Add testing to ensure that drivers handle `moreToCome: false` properly when using the streaming protocol with exhaust support enabled.
  • Determine if testing should be added for the streaming protocol without exhaust support.  If tests are deemed valuable, add testing for the streaming protocol without exhaust support.


 Comments   
Comment by Shane Harvey [ 20/Nov/23 ]

Sure, FY25Q1 sounds fine. This is not an emergency issue since servers never send moreToCome: false on an hello exhaust stream but would be good to test regardless.

Comment by Daria Pardue [ 20/Nov/23 ]

shane.harvey@mongodb.com Since the team QP plans for this quarter are already set, the best way to move this forward would be to add this ticket to the FY25Q1 planning whenever we do the next call for spec owner recommendations (I think you could also add the quarter label now and it will pop up in planning). Though if this is an emergency, feel free to send it through leads triage.

Generated at Thu Feb 08 08:26:18 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.