[DRIVERS-2458] Add test that monitors do not create excessive connections during quiesce mode Created: 30/Sep/22 Updated: 26/Jun/23 |
|
| Status: | Backlog |
| Project: | Drivers |
| Component/s: | SDAM |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Patrick Freed | Assignee: | Unassigned |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Driver Changes: | Needed | ||||||||
| Description |
SummaryWe currently lack tests for the behavior of the driver when a server is shutting down / in quiesce mode. In particular, we lack a case verifying that driver monitors do not repeatedly attempt to create new connections to the server when it is in quiesce mode. This bug is easy to introduce, and has been observed in the Java driver already (see MotivationWho is the affected end user?Driver authors, and potentially users if a bug is discovered. How does this affect the end user?If a driver is affected by this bug, it will create and close a high number of connections for the duration of the quiesce period. This can contribute to connection storms and cluster instability. One customer was affected by this bug in the Java driver. See the above HELP ticket. How likely is it that this problem or use case will occur?If the driver is affected by this bug, then it will occur any time the server enters quiesce mode (i.e. every time it shuts down, so most planned and unplanned maintenance events). If the problem does occur, what are the consequences and how severe are they?The driver will make a large amount of connections. The consequences of this may vary, but it could cause application performance degradation, network performance degradation, and potentially crashing the quiescing node. Is this issue urgent?Fixing the bug is urgent if a driver is affected by it. The test helps identify whether or not that is the case. Is this ticket required by a downstream team?No Is this ticket only for tests?Yes, unless a bug is discovered. An example prose test could enable a failpoint on hello with a ShutdownInProgress error. Then create a client with heartbeatFrequencyMS = 500, sleep for 2 seconds, and assert that the number of observed heartbeatFailedEvents is between 3 and 5. |
| Comments |
| Comment by Neal Beeken [ 06/Oct/22 ] |
|
Triage Notes: we have a similar test: |