Client Backpressure: overload retry policy

XMLWordPrintableJSON

    • Type: Epic
    • Resolution: Unresolved
    • Priority: Major - P3
    • 5.7.0
    • Affects Version/s: None
    • Component/s: Retryability
    • None
    • Client Backpressure 2
    • Java Drivers
    • Needed
    • Hide
      Please wait until the ticket is done. Only then will we be able to provide you with the documentation changes needed.

      1. What would you like to communicate to the user about this feature?
      2. Would you like the user to see examples of the syntax and/or executable code and its output?
      3. Which versions of the driver/connector does this apply to?
      Show
      Please wait until the ticket is done. Only then will we be able to provide you with the documentation changes needed. 1. What would you like to communicate to the user about this feature? 2. Would you like the user to see examples of the syntax and/or executable code and its output? 3. Which versions of the driver/connector does this apply to?
    • In Progress
    • None
    • 9
    • 12
    • 15
    • 100
    • 66
    • 🔴 Roadblock
    • Hide

      Engineer(s): Valentin Kovalenko, Slav Babanin
      2026-Mar-26:

      • Timeline updates
        • Set "Cost to Date" to 11 + 1 = 12.
          • This is because Valentin Kovalenko spent 1 week on the epic, and Slav Babanin did not spent any time on it.
        • "Final Cost Estimate" did not change.
          • It is not unlikely that Valentin Kovalenko will have to increase it in the future because the estimate turned out wrong:
            • The 1 out of 3 weeks estimated for Valentin Kovalenko's work was spent on implementing prose tests. Only 15 weeks (Final Cost Estimate) - 12 weeks (Cost to Date) - 1 week (estimated cost left for Slav Babanin) = 2 weeks estimated cost is left for implementing driver logic for Valentin Kovalenko.
        • Set "End date" from 2026-Apr-24 to 2026-Mar-26 + (Final Cost Estimate - Cost to Date) + 2 weeks = 2026-Mar-26 + (15 weeks - 12 weeks) + 2 weeks = 2026-May-01.
          • The (2026-Mar-26 + (Final Cost Estimate - Cost to Date) = 2026-Apr-17) part gives the estimate for when all the required code changes are done (though, not exactly, because "Final Cost Estimate" includes the cost of addressing reviews in the PRs created within this epic).
          • The (+ 2 weeks) part takes into account that there will be reviews by other people, which do not increase "Final Cost Estimate", but affects "End date".
        • Set "Confidence Status" to "Roadblock" (Red).
          • Because Valentin Kovalenko moved "End date" into the future.
      • Any risks/blockers/impediments?
        • No.
      Show
      Engineer(s): Valentin Kovalenko, Slav Babanin 2026-Mar-26: Timeline updates Set "Cost to Date" to 11 + 1 = 12. This is because Valentin Kovalenko spent 1 week on the epic, and Slav Babanin did not spent any time on it. "Final Cost Estimate" did not change. It is not unlikely that Valentin Kovalenko will have to increase it in the future because the estimate turned out wrong: The 1 out of 3 weeks estimated for Valentin Kovalenko's work was spent on implementing prose tests. Only 15 weeks (Final Cost Estimate) - 12 weeks (Cost to Date) - 1 week (estimated cost left for Slav Babanin) = 2 weeks estimated cost is left for implementing driver logic for Valentin Kovalenko. Set "End date" from 2026-Apr-24 to 2026-Mar-26 + (Final Cost Estimate - Cost to Date) + 2 weeks = 2026-Mar-26 + (15 weeks - 12 weeks) + 2 weeks = 2026-May-01. The (2026-Mar-26 + (Final Cost Estimate - Cost to Date) = 2026-Apr-17) part gives the estimate for when all the required code changes are done (though, not exactly, because "Final Cost Estimate" includes the cost of addressing reviews in the PRs created within this epic). The (+ 2 weeks) part takes into account that there will be reviews by other people, which do not increase "Final Cost Estimate", but affects "End date". Set "Confidence Status" to "Roadblock" (Red). Because Valentin Kovalenko moved "End date" into the future. What was accomplished since the last update? Valentin Kovalenko Started JAVA-5956: Exponential backoff and jitter in retry loops . Created PR Add MongoException.SYSTEM_OVERLOADED_ERROR_LABEL/RETRYABLE_ERROR_LABEL #1926 . Created commit Implement prose backpressure tests . It implements prose tests, but a PR cannot be created until PR JAVA-5950 Update Transactions Convenient API with exponential backoff on retries #1899 is merged. Started JAVA-6055: Clarify `NoWritesPerformed` error label behavior when multiple retries occur . Created PR JAVA-6055 Implement prose backpressure retryable writes tests #1929 . Slav Babanin Nothing within this epic. What's the focus over the next two weeks? Valentin Kovalenko Continue working on JAVA-5956: Exponential backoff and jitter in retry loops . Start implementing driver logic. Slav Babanin Add tests for JAVA-6105: Server selection deprioritization only for overload errors on replica sets and include them and the implementation in JAVA-6021,JAVA-6074,JAVA-6105,JAVA-6114: Add support for server selection's deprioritized servers to all topologies. #1860 once review of the earlier logic is complete. This is a roll over from the previous project report. Any risks/blockers/impediments? No. Notes Some other, mostly backpressure-related work that was done by Valentin Kovalenko Kept the description of JAVA-6019 Client Backpressure: overload retry policy up to date with all the changes in the backpressure project. Discovered that a prose implementation was missed in closed JAVA-6035 Update Handshake to signal support of backpressure . Requested to add it https://github.com/mongodb/mongo-java-driver/pull/1918#discussion_r2984460035 . Closed JAVA-6095 Clarify phase description for "Network timeouts test" spec test . See https://jira.mongodb.org/browse/JAVA-6095?focusedCommentId=8258829&focusedId=8258829&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-8258829 . Discovered problems introduced in DRIVERS-3391 Clarify expected error if backoff exceeds CSOT's deadline in withTransaction . They should be addressed, I'll discuss this with Nabil, see https://github.com/mongodb/mongo-java-driver/pull/1899#discussion_r2931795298 . Updated the testing/resources/specifications submodule in the driver. Skipped tests that had to be skipped and updated the relevant Jira tickets accordingly. PR https://github.com/mongodb/mongo-java-driver/pull/1915 . Created PR Client Backpressure #1918 . This is a PR for merging the feature branch backpressure into main . Left a description with some procedural notes for the engineers involved. Created Fix prose retryable writes test 6, case 3 . Re-reviewed PR JAVA-5950 Update Transactions Convenient API with exponential backoff on retries #1899 . Re-reviewed PR Add Javadoc to specify that CSOT does not limit socket writes #1791 . Reviewed PR Run tests on JDK 17 and JDK 25 #153 . Reviewed PR Add RetryableWriteError to write concern error in bulk writes. #1880 . Re-reviewed (partial review) PR (DOCSP-54006) IWM Landing Page, Operation Rate Limiter, and Load-Shedding Errors #17281 . Reviewed PR JAVA-6144 increase wait time for thread pool shutdown #1920 . Reviewed PR JAVA-6148 wait for insert operations first #1927 . Reviewed (partial review) PR JAVA-5949 preserve connection pool on backpressure errors when establishing connections #1900 .
    • Hide

      2026-03-02 - 🔴 Roadblock
      Engineer(s): Valentin Kovalenko, Slav Babanin
      2026-01-12:

      • Timeline updates
        • The "End date", "Final Cost Estimate" updating instructions are unclear, especially for the "Final Cost Estimate" (https://wiki.corp.mongodb.com/spaces/DRIVERS/pages/214438893/DBX+Bi-Weekly+Project+Updates+Process#DBXBiWeeklyProjectUpdatesProcess-FieldstoUpdate).
          • Also, those instructions refer to "current estimates", "projected final cost", "target end date" - there are no such fields.
            There are "Scope Cost Estimate", "Final Cost Estimate", "Target end", "End date" fields, and it is unclear which of these are meant above, if any.
            • It would have been helpful if the instructions referred to the fields by their exact names, and used formatting that allows readers to distinguish field names from the rest of the text.
            • It may be a good idea to name all the fields such that the names help to understand the meaning, and also are consistent with each other.
          • The "Confidence Status" updating instruction "Confidence status is to be applied against the date reported in the End date field." seems incorrect:
            • The "End date" is allowed to change as the project progresses, to reflect the date by which we estimate to complete the project. Therefore, the instruction quoted above means that the "Confidence Status" should always be green, which is certainly not what the instruction is supposed to convey.
      • Any risks/blockers/impediments?
        • No.

          2026-02-18 - 🔴 Roadblock
          Engineer(s): Valentin Kovalenko, Slav Babanin
          2026-01-29:

      • Timeline updates
        • "Target end", "Scope Cost Estimate" will be updated again after discussing with the team.
      • Any risks/blockers/impediments?
        • No.

          2026-02-03 - 🔴 Roadblock
          Engineer(s): Valentin Kovalenko, Slav Babanin
          2026-01-29:

      • Timeline updates
        • "Target end", "Scope Cost Estimate" will be updated again after discussing with the team.
      • Any risks/blockers/impediments?
        • No.

          2026-02-03 - No confidence status provided
          Engineer(s): Valentin Kovalenko, Slav Babanin
          2026-01-29:

      • Timeline updates
        • "Target end", "Scope Cost Estimate" will be updated again after discussing with the team.
      • Any risks/blockers/impediments?
        • No.

          2026-02-01 - No confidence status provided
          Engineer(s): Valentin Kovalenko, Slav Babanin
          2026-01-29:

      • Timeline updates
        • "Target end", "Scope Cost Estimate" will be updated again after discussing with the team.
      • Any risks/blockers/impediments?
        • No.

          2026-01-07 - No confidence status provided
          Engineer(s): Valentin Kovalenko
          2026-01-01: "End date", "Final Cost Estimate" will be updated some time later after re-evaluating them with the team.

      • Timeline updates
        • See above.
      • Any risks/blockers/impediments?
        • The new estimates will be higher than our original estimates, both because of the scope changes and because of the changes in our understanding (though the latter can be seen a subset of the former).
        • The new estimates are likely to be done without having the specification work completed, let alone having the Java driver design for the overload retry policy completed.
            • 2026-01-07 - No confidence status provided
              Engineer(s): Valentin Kovalenko
              2026-01-01: "End date", "Final Cost Estimate" will be updated some time later after re-evaluating them with the team.

      • Timeline updates
        • See above.
      • Any risks/blockers/impediments?
        • The new estimates will be higher than our original estimates, both because of the scope changes and because of the changes in our understanding (though the latter can be seen a subset of the former).
        • The new estimates are likely to be done without having the specification work completed, let alone having the Java driver design for the overload retry policy completed.
      Show
      2026-03-02 - 🔴 Roadblock Engineer(s): Valentin Kovalenko, Slav Babanin 2026-01-12: Timeline updates The "End date", "Final Cost Estimate" updating instructions are unclear, especially for the "Final Cost Estimate" ( https://wiki.corp.mongodb.com/spaces/DRIVERS/pages/214438893/DBX+Bi-Weekly+Project+Updates+Process#DBXBiWeeklyProjectUpdatesProcess-FieldstoUpdate ). Also, those instructions refer to "current estimates", "projected final cost", "target end date" - there are no such fields. There are "Scope Cost Estimate", "Final Cost Estimate", "Target end", "End date" fields, and it is unclear which of these are meant above, if any. It would have been helpful if the instructions referred to the fields by their exact names, and used formatting that allows readers to distinguish field names from the rest of the text. It may be a good idea to name all the fields such that the names help to understand the meaning, and also are consistent with each other. The "Confidence Status" updating instruction "Confidence status is to be applied against the date reported in the End date field." seems incorrect: The "End date" is allowed to change as the project progresses, to reflect the date by which we estimate to complete the project. Therefore, the instruction quoted above means that the "Confidence Status" should always be green, which is certainly not what the instruction is supposed to convey. What was accomplished since the last update? Valentin Kovalenko: Finished reviewing DRIVERS-3239: Add exponential backoff to operation retry loop for server overloaded errors #1862 . Re-reviewed JAVA-5950: Update Transactions Convenient API with exponential backoff on retries #1852 . Partially reviewed JAVA-6021, JAVA-6074: Add support for server selection's deprioritized servers to all topologies. #1860 . Slav Babanin: Added unified tests to JAVA-6021, JAVA-6074: Add support for server selection's deprioritized servers to all topologies. #1860 . Reviewed DRIVERS-3391: Clarify withTransaction CSOT timeout error with backoff+jitter #1890 . What's the focus over the next two weeks? One of the two next weeks (the second one) I am the first responder. Finish reviewing JAVA-6021, JAVA-6074: Add support for server selection's deprioritized servers to all topologies. #1860 . Re-review JAVA-5950: Update Transactions Convenient API with exponential backoff on retries #1852 . Re-review DRIVERS-3391: Clarify withTransaction CSOT timeout error with backoff+jitter #1890 . Continue reviewing other relevant PRs as needed. No implementation work is expected. Any risks/blockers/impediments? No. 2026-02-18 - 🔴 Roadblock Engineer(s): Valentin Kovalenko, Slav Babanin 2026-01-29: Timeline updates "Target end", "Scope Cost Estimate" will be updated again after discussing with the team. What was accomplished since the last update? Valentin Kovalenko: Continued reviewing DRIVERS-3239: Add exponential backoff to operation retry loop for server overloaded errors . Three leads, the PR author and I have been having regular meetings. I believe, we are getting closer to a state at which I will be able to approve the requirements expressed in the PR. Re-reviewed PR [JAVA-5950 Update Transactions Convenient API with exponential backoff on retries] ( https://github.com/mongodb/mongo-java-driver/pull/1852 ). Slav Babanin: Worked on JAVA-6021: Add support for server selection's deprioritized servers to all topologies . Added additional test cases and addressed merge conflicts. What's the focus over the next two weeks? Continue reviewing DRIVERS-3239: Add exponential backoff to operation retry loop for server overloaded errors , hopefully approve. If DRIVERS-3239: Add exponential backoff to operation retry loop for server overloaded errors is done and ready to be implemented, continue working on JAVA-6025: Client Backpressure: design overload retry policy concurrently with JAVA-5956: Exponential backoff and jitter in retry loops . Work on JAVA-6021: Add support for server selection's deprioritized servers to all topologies . Resolve new merge conflicts and incorporate spec test changes. Continue reviewing the existing JAVA PRs done as part of JAVA-5942: Client Backpressure , JAVA-6019: Client Backpressure: overload retry policy as needed. Any risks/blockers/impediments? No. 2026-02-03 - 🔴 Roadblock Engineer(s): Valentin Kovalenko, Slav Babanin 2026-01-29: Timeline updates "Target end", "Scope Cost Estimate" will be updated again after discussing with the team. What was accomplished since the last update? Valentin Kovalenko: Continued reviewing DRIVERS-3239: Add exponential backoff to operation retry loop for server overloaded errors . Three leads, the PR author and I have been having regular meetings. I believe, we are getting closer to a state at which I will be able to approve the requirements expressed in the PR. Re-reviewed PR [JAVA-5950 Update Transactions Convenient API with exponential backoff on retries] ( https://github.com/mongodb/mongo-java-driver/pull/1852 ). Slav Babanin: Worked on JAVA-6021: Add support for server selection's deprioritized servers to all topologies . Added additional test cases and addressed merge conflicts. What's the focus over the next two weeks? Continue reviewing DRIVERS-3239: Add exponential backoff to operation retry loop for server overloaded errors , hopefully approve. If DRIVERS-3239: Add exponential backoff to operation retry loop for server overloaded errors is done and ready to be implemented, continue working on JAVA-6025: Client Backpressure: design overload retry policy concurrently with JAVA-5956: Exponential backoff and jitter in retry loops . Work on JAVA-6021: Add support for server selection's deprioritized servers to all topologies . Resolve new merge conflicts and incorporate spec test changes. Continue reviewing the existing JAVA PRs done as part of JAVA-5942: Client Backpressure , JAVA-6019: Client Backpressure: overload retry policy as needed. Any risks/blockers/impediments? No. 2026-02-03 - No confidence status provided Engineer(s): Valentin Kovalenko, Slav Babanin 2026-01-29: Timeline updates "Target end", "Scope Cost Estimate" will be updated again after discussing with the team. What was accomplished since the last update? Valentin Kovalenko: Continued reviewing DRIVERS-3239: Add exponential backoff to operation retry loop for server overloaded errors . Three leads, the PR author and I have been having regular meetings. I believe, we are getting closer to a state at which I will be able to approve the requirements expressed in the PR. Re-reviewed PR [JAVA-5950 Update Transactions Convenient API with exponential backoff on retries] ( https://github.com/mongodb/mongo-java-driver/pull/1852 ). Slav Babanin: Worked on JAVA-6021: Add support for server selection's deprioritized servers to all topologies . Added additional test cases and addressed merge conflicts. What's the focus over the next two weeks? Continue reviewing DRIVERS-3239: Add exponential backoff to operation retry loop for server overloaded errors , hopefully approve. If DRIVERS-3239: Add exponential backoff to operation retry loop for server overloaded errors is done and ready to be implemented, continue working on JAVA-6025: Client Backpressure: design overload retry policy concurrently with JAVA-5956: Exponential backoff and jitter in retry loops . Work on JAVA-6021: Add support for server selection's deprioritized servers to all topologies . Resolve new merge conflicts and incorporate spec test changes. Continue reviewing the existing JAVA PRs done as part of JAVA-5942: Client Backpressure , JAVA-6019: Client Backpressure: overload retry policy as needed. Any risks/blockers/impediments? No. 2026-02-01 - No confidence status provided Engineer(s): Valentin Kovalenko, Slav Babanin 2026-01-29: Timeline updates "Target end", "Scope Cost Estimate" will be updated again after discussing with the team. What was accomplished since the last update? Valentin Kovalenko: Continued reviewing DRIVERS-3239: Add exponential backoff to operation retry loop for server overloaded errors . Three leads, the PR author and I have been having regular meetings. I believe, we are getting closer to a state at which I will be able to approve the requirements expressed in the PR. Re-reviewed PR [JAVA-5950 Update Transactions Convenient API with exponential backoff on retries] ( https://github.com/mongodb/mongo-java-driver/pull/1852 ). Slav Babanin: Worked on JAVA-6021: Add support for server selection's deprioritized servers to all topologies . Added additional test cases and addressed merge conflicts. What's the focus over the next two weeks? Continue reviewing DRIVERS-3239: Add exponential backoff to operation retry loop for server overloaded errors , hopefully approve. If DRIVERS-3239: Add exponential backoff to operation retry loop for server overloaded errors is done and ready to be implemented, continue working on JAVA-6025: Client Backpressure: design overload retry policy concurrently with JAVA-5956: Exponential backoff and jitter in retry loops . Work on JAVA-6021: Add support for server selection's deprioritized servers to all topologies . Resolve new merge conflicts and incorporate spec test changes. Continue reviewing the existing JAVA PRs done as part of JAVA-5942: Client Backpressure , JAVA-6019: Client Backpressure: overload retry policy as needed. Any risks/blockers/impediments? No. 2026-01-07 - No confidence status provided Engineer(s): Valentin Kovalenko 2026-01-01: "End date", "Final Cost Estimate" will be updated some time later after re-evaluating them with the team. Timeline updates See above. What was accomplished since the last update? Continued reviewing DRIVERS-3239: Add exponential backoff to operation retry loop for server overloaded errors . The PR has been actively updated. Reviewed the existing non-draft JAVA PRs done as part of Client Backpressure . JAVA-5950 Update Transactions Convenient API with exponential backoff on retries [JAVA-6033] ServerHeartbeatSucceededEvent is not fired for initial POLL monitoring #1856 What's the focus over the next two weeks? Continue reviewing DRIVERS-3239: Add exponential backoff to operation retry loop for server overloaded errors . Continue reviewing the existing JAVA PRs done as part of Client Backpressure as needed. Finish JAVA-6025: Client Backpressure: design overload retry policy , provided that the relevant spec requirements are approved, merged, and there is enough time left within the two weeks to finish the design. Any risks/blockers/impediments? The new estimates will be higher than our original estimates, both because of the scope changes and because of the changes in our understanding (though the latter can be seen a subset of the former). The new estimates are likely to be done without having the specification work completed, let alone having the Java driver design for the overload retry policy completed. 2026-01-07 - No confidence status provided Engineer(s): Valentin Kovalenko 2026-01-01: "End date", "Final Cost Estimate" will be updated some time later after re-evaluating them with the team. Timeline updates See above. What was accomplished since the last update? Continued reviewing DRIVERS-3239: Add exponential backoff to operation retry loop for server overloaded errors . The PR has been actively updated. Reviewed the existing non-draft JAVA PRs done as part of Client Backpressure . JAVA-5950 Update Transactions Convenient API with exponential backoff on retries [JAVA-6033] ServerHeartbeatSucceededEvent is not fired for initial POLL monitoring #1856 What's the focus over the next two weeks? Continue reviewing DRIVERS-3239: Add exponential backoff to operation retry loop for server overloaded errors . Continue reviewing the existing JAVA PRs done as part of Client Backpressure as needed. Finish JAVA-6025: Client Backpressure: design overload retry policy , provided that the relevant spec requirements are approved, merged, and there is enough time left within the two weeks to finish the design. Any risks/blockers/impediments? The new estimates will be higher than our original estimates, both because of the scope changes and because of the changes in our understanding (though the latter can be seen a subset of the former). The new estimates are likely to be done without having the specification work completed, let alone having the Java driver design for the overload retry policy completed.
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      This epic exists solely for reporting the work done by valentin.kovalenko@mongodb.com; update: and slav.babanin@mongodb.com.

      Specification changes

      Scope

      The driver specification changes this epic should implement are a subset of those introduced by DRIVERS-3160 that is referred to as "New retry logic, with token bucket" in the "Backpressure Estimate" document. It is difficult to tell what tickets constitute that subset:

      Everything related

      The last commit I checked at https://github.com/mongodb/specifications/commits/master/ is 290ee48, the last PR I checked at https://github.com/mongodb/specifications/pulls is #1917.

      Design

            Assignee:
            Valentin Kavalenka
            Reporter:
            Valentin Kavalenka
            None
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              20 weeks, 3 days
              None
              None