Uploaded image for project: 'Node.js Driver'
  1. Node.js Driver
  2. NODE-3470

Direct read/write retries to another mongos if possible

    • 3
    • Hide

      DRIVERS-1571:

      • Drivers should implement server selection and read/write retry mechanisms changes, as well as new prose tests: specifications@86d961f
      • 2024-02-21: Drivers that have not yet completed this ticket should reference f5bb605 (DRIVERS-2828) for updated prose test specification.
      Show
      DRIVERS-1571 : Drivers should implement server selection and read/write retry mechanisms changes, as well as new prose tests: specifications@86d961f 2024-02-21: Drivers that have not yet completed this ticket should reference f5bb605 ( DRIVERS-2828 ) for updated prose test specification.
    • Not Needed
    • Not Needed
    • Hide

      1. What would you like to communicate to the user about this feature?
      2. Would you like the user to see examples of the syntax and/or executable code and its output?
      3. Which versions of the driver/connector does this apply to?

      Show
      1. What would you like to communicate to the user about this feature? 2. Would you like the user to see examples of the syntax and/or executable code and its output? 3. Which versions of the driver/connector does this apply to?

      There are several scenarios in which it would be useful to redirect reads or writes to a different mongos.

      1. A MongoDB sharded cluster deployment may find itself in a situation when a mongos reports itself as being healthy but is unable to execute any queries. The driver has attempted to retry the failing queries, but in a number of cases selected the same mongos that failed in the first place which caused the retry to also fail (for the same reason as the original attempt) and be propagated to the application.
      2. Currently when the driver is in sharded topology, server selection spec requires a random server to be selected for each operation. This permits the same failed mongos to be selected for both an operation and its retry, with the result that the query fails, even when there are healthy mongoses in the deployment that could have successfully executed the query.

      The suggested improvement is for the driver, when in sharded cluster topology, to:

      • Track whether a server selection request is for the first attempt or for a retry,
      • Track the server used for the first attempt,
      • When selecting the server for the retry, if there are multiple eligible mongoses, select randomly from mongoses other than the one used for the first attempt.
      • bonus nice to have: determine if a mongos is healthy before making said attempt and if unhealthy, exclude from selection

       

      Acceptance Criteria:

      Implementation:

      • Update server selection to handle deprioritised servers and to not select from them when the topology is sharded if other servers are present.
      • When no other servers are present a deprioritised server must be selected.
      • When retrying a read or write set the previous selected server and pass it in the array of deprioritised servers to server selection.

      Testing:

      Unit Tests

      Server Selection: (https://github.com/mongodb/node-mongodb-native/blob/main/test/unit/sdam/server_selection.test.js):

      • Test that server selection can take an array of deprioritised servers and that it correctly handles the following cases:
        • Deprioritised servers are excluded from the result when topology is sharded and there are other servers available.
        • Deprioritised servers are returned from the result when topology is sharded and no other servers exist.
        • Deprioritised servers can be returned from the result when the topology is not sharded even if other servers are available.
        • Server selection can handle deprioritised servers not being provided (argument is undefined AND argument = empty array).
          **NOTE: for each scenario we want to make sure to run server selection X number of times to account for the randomness of server selection and guarantee that we are getting each expected result for the right reason (X = 10 will work for 2 servers); make sure that the server descriptions themselves are otherwise equivalent between the prioritized and non-prioritized

      Prose tests

      • Two new tests for retryable writes
        • Test that in a sharded cluster writes are retried on a different mongos if one available
        • Test that in a sharded cluster writes are retried on the same mongos if no other is available
      • Two new tests for retryable reads
        • Retryable Reads Are Retried on a Different mongos if One is Available
        • Retryable Reads Are Retried on the Same mongos if No Others are Available

      Wont Do:

      • Determining if the mongos is healthy as it's not defined in the spec what that means

            Assignee:
            durran.jordan@mongodb.com Durran Jordan
            Reporter:
            backlog-server-pm Backlog - Core Eng Program Management Team
            Alena Khineika
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: