Uploaded image for project: 'Drivers'
  1. Drivers
  2. DRIVERS-2458

Add test that monitors do not create excessive connections during quiesce mode

    • Type: Icon: Improvement Improvement
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Component/s: SDAM
    • Labels:
      None
    • Needed

      Summary

      We currently lack tests for the behavior of the driver when a server is shutting down / in quiesce mode. In particular, we lack a case verifying that driver monitors do not repeatedly attempt to create new connections to the server when it is in quiesce mode. This bug is easy to introduce, and has been observed in the Java driver already (see JAVA-4743 and HELP-37852). We should add a test case for this to ensure other drivers are unaffected.

      Motivation

      Who is the affected end user?

      Driver authors, and potentially users if a bug is discovered.

      How does this affect the end user?

      If a driver is affected by this bug, it will create and close a high number of connections for the duration of the quiesce period. This can contribute to connection storms and cluster instability.

      One customer was affected by this bug in the Java driver. See the above HELP ticket.

      How likely is it that this problem or use case will occur?

      If the driver is affected by this bug, then it will occur any time the server enters quiesce mode (i.e. every time it shuts down, so most planned and unplanned maintenance events).

      If the problem does occur, what are the consequences and how severe are they?

      The driver will make a large amount of connections. The consequences of this may vary, but it could cause application performance degradation, network performance degradation, and potentially crashing the quiescing node.

      Is this issue urgent?

      Fixing the bug is urgent if a driver is affected by it. The test helps identify whether or not that is the case.

      Is this ticket required by a downstream team?

      No

      Is this ticket only for tests?

      Yes, unless a bug is discovered.

      An example prose test could enable a failpoint on hello with a ShutdownInProgress error. Then create a client with heartbeatFrequencyMS = 500, sleep for 2 seconds, and assert that the number of observed heartbeatFailedEvents is between 3 and 5.

            Assignee:
            Unassigned Unassigned
            Reporter:
            patrick.freed@mongodb.com Patrick Freed
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: