Avoid clearing the connection pool when the server connection rate limiter triggers

XMLWordPrintableJSON

    • Needed
    • Hide

      Summary of necessary driver changes

      • Drivers should update their CMAP & SDAM error handling implementations to NOT clear the pool if connection establishment fails in certain circumstances.  Additionally, drivers now new error labels to establishment errors.

      Commits for syncing spec/prose tests
      (and/or refer to an existing language POC if needed)

      Context for other referenced/linked tickets{}

      Show
      Summary of necessary driver changes Drivers should update their CMAP & SDAM error handling implementations to NOT clear the pool if connection establishment fails in certain circumstances.  Additionally, drivers now new error labels to establishment errors. Commits for syncing spec/prose tests (and/or refer to an existing language POC if needed) https://github.com/mongodb/specifications/commit/1cafc53b729df3f1ecb144dba78d51da6ffa6a55 (follow-up test fix) https://github.com/mongodb/specifications/commit/c12c50ed0ff3df870883e2c1d51254e933f9850e Context for other referenced/linked tickets { }
    • $i18n.getText("admin.common.words.hide")
      Key Status/Resolution FixVersion
      CDRIVER-6083 Backlog
      CXX-3331 Backlog
      CSHARP-5711 In Code Review
      GODRIVER-3646 Backlog
      JAVA-5949 In Progress
      NODE-7121 Fixed 7.1.0
      PYTHON-5517 Done
      PHPLIB-1713 Won't Do
      RUBY-3700 Backlog
      RUST-2267 In Progress
      $i18n.getText("admin.common.words.show")
      #scriptField, #scriptField *{ border: 1px solid black; } #scriptField{ border-collapse: collapse; } #scriptField td { text-align: center; /* Center-align text in table cells */ } #scriptField td.key { text-align: left; /* Left-align text in the Key column */ } #scriptField a { text-decoration: none; /* Remove underlines from links */ border: none; /* Remove border from links */ } /* Add green background color to cells with FixVersion */ #scriptField td.hasFixVersion { background-color: #00FF00; /* Green color code */ } #scriptField td.willNotDo { background-color: #FF0000; /* Red color code */ } /* Center-align the first row headers */ #scriptField th { text-align: center; } Key Status/Resolution FixVersion CDRIVER-6083 Backlog CXX-3331 Backlog CSHARP-5711 In Code Review GODRIVER-3646 Backlog JAVA-5949 In Progress NODE-7121 Fixed 7.1.0 PYTHON-5517 Done PHPLIB-1713 Won't Do RUBY-3700 Backlog RUST-2267 In Progress

      Summary

      Avoid clearing the connection pool when the server connection rate limiter triggers.

      Motivation

      When a driver is creating a new connection to an overloaded server and it rejects due to the ingress connection rate limiter, the driver will react by clearing the pool, closing the SDAM connection, and triggering an immediate SDAM check. This is bad for a few reasons: 1) the immediate SDAM check will need to create a new connection which will likely also fail for the same reason 2) the existing connections in the pool were healthy and there was no reason to clear them. Now the client needs to repopulate the connection pool which puts even more connection creation pressure on the already overloaded node.

      In practice this behavior acts as a sort of bad circuit breaker which can shut off traffic to the overloaded server, after a time the server recovers, then hits the rate limit again which shuts off traffic again, leading to a potential meta stable failure mode.

      Who is the affected end user?

      Who are the stakeholders?

      How does this affect the end user?

      Are they blocked? Are they annoyed? Are they confused?

      How likely is it that this problem or use case will occur?

      Main path? Edge case?

      If the problem does occur, what are the consequences and how severe are they?

      Minor annoyance at a log message? Performance concern? Outage/unavailability? Failover can't complete?

      Is this issue urgent?

      Does this ticket have a required timeline? What is it?

      Is this ticket required by a downstream team?

      Needed by e.g. Atlas, Shell, Compass?

      Is this ticket only for tests?

      Does this ticket have any functional impact, or is it just test improvements?

      Acceptance Criteria

      What specific requirements must be met to consider the design phase complete?

            Assignee:
            Steve Silvester
            Reporter:
            Shane Harvey
            Jib Adegunloye Jib Adegunloye
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated: