Uploaded image for project: 'Drivers'
  1. Drivers
  2. DRIVERS-1571

Direct read/write retries to another mongos if possible

    • Not Needed
    • Hide
      • Drivers should implement server selection and read/write retry mechanisms changes, as well as new prose tests: specifications@86d961f
      • 2024-02-21: Drivers that have not yet completed this ticket should reference f5bb605 (DRIVERS-2828) for updated prose test specification.
      Show
      Drivers should implement server selection and read/write retry mechanisms changes, as well as new prose tests: specifications@86d961f 2024-02-21: Drivers that have not yet completed this ticket should reference f5bb605 ( DRIVERS-2828 ) for updated prose test specification.
    • To Do
    • Direct read/write retries to another mongos if possible
    • 0
    • 0
    • 0
    • 100
    • Hide

      Engineer: Dmitry Rybakov
      Summary: When encountering a retryable error, direct the retry attempt to a different mongos if possible.

      2023-06-23

      • Design approved
      • Ready for implementation in Q3 with Go

      2023-06-09

      • Design approved
      • Changes will be ported to spec repo

      2023-05-12

      • Design work started
      • Decided against adapting unified test format to accommodate special test needs for this project due to implementation complexity involved
      Show
      Engineer: Dmitry Rybakov Summary: When encountering a retryable error, direct the retry attempt to a different mongos if possible. 2023-06-23 Design approved Ready for implementation in Q3 with Go 2023-06-09 Design approved Changes will be ported to spec repo 2023-05-12 Design work started Decided against adapting unified test format to accommodate special test needs for this project due to implementation complexity involved
    • Not Needed
    • Hide

      Details TBD

      Show
      Details TBD
    • $i18n.getText("admin.common.words.hide")
      Key Status/Resolution FixVersion
      CDRIVER-4099 Fixed 1.26.0
      CXX-2320 Works as Designed
      CSHARP-3757 Done 2.26.0
      GODRIVER-2101 Fixed 1.13.0, 1.13.1
      JAVA-4254 Done 5.2.0
      NODE-3470 Fixed 6.4.0
      MOTOR-792 Duplicate
      PYTHON-2834 Fixed 4.7
      PHPLIB-1459 Fixed 1.20.0
      RUBY-2748 Fixed 2.20.0
      RUST-935 Fixed 2.8.0
      SWIFT-1279 Won't Do
      $i18n.getText("admin.common.words.show")
      #scriptField, #scriptField *{ border: 1px solid black; } #scriptField{ border-collapse: collapse; } #scriptField td { text-align: center; /* Center-align text in table cells */ } #scriptField td.key { text-align: left; /* Left-align text in the Key column */ } #scriptField a { text-decoration: none; /* Remove underlines from links */ border: none; /* Remove border from links */ } /* Add green background color to cells with FixVersion */ #scriptField td.hasFixVersion { background-color: #00FF00; /* Green color code */ } /* Center-align the first row headers */ #scriptField th { text-align: center; } Key Status/Resolution FixVersion CDRIVER-4099 Fixed 1.26.0 CXX-2320 Works as Designed CSHARP-3757 Done 2.26.0 GODRIVER-2101 Fixed 1.13.0, 1.13.1 JAVA-4254 Done 5.2.0 NODE-3470 Fixed 6.4.0 MOTOR-792 Duplicate PYTHON-2834 Fixed 4.7 PHPLIB-1459 Fixed 1.20.0 RUBY-2748 Fixed 2.20.0 RUST-935 Fixed 2.8.0 SWIFT-1279 Won't Do

      There are several scenarios in which it would be useful to redirect reads or writes to a different mongos.

      1. A MongoDB sharded cluster deployment may find itself in a situation when a mongos reports itself as being healthy but is unable to execute any queries. The driver has attempted to retry the failing queries, but in a number of cases selected the same mongos that failed in the first place which caused the retry to also fail (for the same reason as the original attempt) and be propagated to the application.
      2. Currently when the driver is in sharded topology, server selection spec requires a random server to be selected for each operation. This permits the same failed mongos to be selected for both an operation and its retry, with the result that the query fails, even when there are healthy mongoses in the deployment that could have successfully executed the query.

      The suggested improvement is for the driver, when in sharded cluster topology, to:

      • Track whether a server selection request is for the first attempt or for a retry,
      • Track the server used for the first attempt,
      • When selecting the server for the retry, if there are multiple eligible mongoses, select randomly from mongoses other than the one used for the first attempt.
      • bonus nice to have: determine if a mongos is healthy before making said attempt and if unhealthy, exclude from selection

      Cast of Characters:
      Product Manager for Feature: alex.bevilacqua@mongodb.com
      Program Manager: tom.selander@mongodb.com
      Engineering Lead: dmitry.rybakov@mongodb.com

            Assignee:
            dmitry.rybakov@mongodb.com Dmitry Rybakov
            Reporter:
            oleg.pudeyev@mongodb.com Oleg Pudeyev (Inactive)
            Jeffrey Yemin Jeffrey Yemin
            Tom Selander Tom Selander
            Alex Bevilacqua Alex Bevilacqua
            Votes:
            6 Vote for this issue
            Watchers:
            32 Start watching this issue

              Created:
              Updated:
              Resolved: