Avoid clearing the connection pool when the server connection rate limiter triggers

XMLWordPrintableJSON

    • $i18n.getText("admin.common.words.hide")
      Key Status/Resolution FixVersion
      CDRIVER-6083 Fixed 2.3.0
      CXX-3331 Backlog
      CSHARP-5711 Fixed 3.7.0
      GODRIVER-3646 In Progress
      JAVA-5949 In Code Review
      NODE-7121 Fixed 7.1.0
      PYTHON-5517 Done
      PHPLIB-1713 Won't Do
      RUBY-3700 In Progress
      RUST-2267 Fixed 3.6.0
      $i18n.getText("admin.common.words.show")
      #scriptField, #scriptField *{ border: 1px solid black; } #scriptField{ border-collapse: collapse; } #scriptField td { text-align: center; /* Center-align text in table cells */ } #scriptField td.key { text-align: left; /* Left-align text in the Key column */ } #scriptField a { text-decoration: none; /* Remove underlines from links */ border: none; /* Remove border from links */ } /* Add green background color to cells with FixVersion */ #scriptField td.hasFixVersion { background-color: #00FF00; /* Green color code */ } #scriptField td.willNotDo { background-color: #FF0000; /* Red color code */ } /* Center-align the first row headers */ #scriptField th { text-align: center; } Key Status/Resolution FixVersion CDRIVER-6083 Fixed 2.3.0 CXX-3331 Backlog CSHARP-5711 Fixed 3.7.0 GODRIVER-3646 In Progress JAVA-5949 In Code Review NODE-7121 Fixed 7.1.0 PYTHON-5517 Done PHPLIB-1713 Won't Do RUBY-3700 In Progress RUST-2267 Fixed 3.6.0

      Summary

      Avoid clearing the connection pool when the server connection rate limiter triggers.

      Motivation

      When a driver is creating a new connection to an overloaded server and it rejects due to the ingress connection rate limiter, the driver will react by clearing the pool, closing the SDAM connection, and triggering an immediate SDAM check. This is bad for a few reasons: 1) the immediate SDAM check will need to create a new connection which will likely also fail for the same reason 2) the existing connections in the pool were healthy and there was no reason to clear them. Now the client needs to repopulate the connection pool which puts even more connection creation pressure on the already overloaded node.

      In practice this behavior acts as a sort of bad circuit breaker which can shut off traffic to the overloaded server, after a time the server recovers, then hits the rate limit again which shuts off traffic again, leading to a potential meta stable failure mode.

      Who is the affected end user?

      Who are the stakeholders?

      How does this affect the end user?

      Are they blocked? Are they annoyed? Are they confused?

      How likely is it that this problem or use case will occur?

      Main path? Edge case?

      If the problem does occur, what are the consequences and how severe are they?

      Minor annoyance at a log message? Performance concern? Outage/unavailability? Failover can't complete?

      Is this issue urgent?

      Does this ticket have a required timeline? What is it?

      Is this ticket required by a downstream team?

      Needed by e.g. Atlas, Shell, Compass?

      Is this ticket only for tests?

      Does this ticket have any functional impact, or is it just test improvements?

      Acceptance Criteria

      What specific requirements must be met to consider the design phase complete?

            Assignee:
            Steve Silvester
            Reporter:
            Shane Harvey
            Jib Adegunloye Jib Adegunloye
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated: