Avoid clearing the connection pool when the server connection rate limiter triggers

XMLWordPrintableJSON

    • Needed
    • Hide

      Summary of necessary driver changes

      •  

      Commits for syncing spec/prose tests
      (and/or refer to an existing language POC if needed)

      •  

      Context for other referenced/linked tickets

      •  
      Show
      Summary of necessary driver changes   Commits for syncing spec/prose tests (and/or refer to an existing language POC if needed)   Context for other referenced/linked tickets  
    • $i18n.getText("admin.common.words.hide")
      Key Status/Resolution FixVersion
      CDRIVER-6083 Blocked
      CXX-3331 Blocked
      CSHARP-5711 Blocked
      GODRIVER-3646 Blocked
      JAVA-5949 Blocked
      NODE-7121 Blocked
      PYTHON-5517 Blocked
      PHPLIB-1713 Blocked
      RUBY-3700 Blocked
      RUST-2267 Blocked
      $i18n.getText("admin.common.words.show")
      #scriptField, #scriptField *{ border: 1px solid black; } #scriptField{ border-collapse: collapse; } #scriptField td { text-align: center; /* Center-align text in table cells */ } #scriptField td.key { text-align: left; /* Left-align text in the Key column */ } #scriptField a { text-decoration: none; /* Remove underlines from links */ border: none; /* Remove border from links */ } /* Add green background color to cells with FixVersion */ #scriptField td.hasFixVersion { background-color: #00FF00; /* Green color code */ } #scriptField td.willNotDo { background-color: #FF0000; /* Red color code */ } /* Center-align the first row headers */ #scriptField th { text-align: center; } Key Status/Resolution FixVersion CDRIVER-6083 Blocked CXX-3331 Blocked CSHARP-5711 Blocked GODRIVER-3646 Blocked JAVA-5949 Blocked NODE-7121 Blocked PYTHON-5517 Blocked PHPLIB-1713 Blocked RUBY-3700 Blocked RUST-2267 Blocked

      Summary

      Avoid clearing the connection pool when the server connection rate limiter triggers.

      Motivation

      When a driver is creating a new connection to an overloaded server and it rejects due to the ingress connection rate limiter, the driver will react by clearing the pool, closing the SDAM connection, and triggering an immediate SDAM check. This is bad for a few reasons: 1) the immediate SDAM check will need to create a new connection which will likely also fail for the same reason 2) the existing connections in the pool were healthy and there was no reason to clear them. Now the client needs to repopulate the connection pool which puts even more connection creation pressure on the already overloaded node.

      In practice this behavior acts as a sort of bad circuit breaker which can shut off traffic to the overloaded server, after a time the server recovers, then hits the rate limit again which shuts off traffic again, leading to a potential meta stable failure mode.

      Who is the affected end user?

      Who are the stakeholders?

      How does this affect the end user?

      Are they blocked? Are they annoyed? Are they confused?

      How likely is it that this problem or use case will occur?

      Main path? Edge case?

      If the problem does occur, what are the consequences and how severe are they?

      Minor annoyance at a log message? Performance concern? Outage/unavailability? Failover can't complete?

      Is this issue urgent?

      Does this ticket have a required timeline? What is it?

      Is this ticket required by a downstream team?

      Needed by e.g. Atlas, Shell, Compass?

      Is this ticket only for tests?

      Does this ticket have any functional impact, or is it just test improvements?

      Acceptance Criteria

      What specific requirements must be met to consider the design phase complete?

              Assignee:
              Steve Silvester
              Reporter:
              Shane Harvey
              Jib Adegunloye Jib Adegunloye
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated: