Client Backpressure: overload retry policy

XMLWordPrintableJSON

    • Type: Epic
    • Resolution: Unresolved
    • Priority: Major - P3
    • 5.7.0
    • Affects Version/s: None
    • Component/s: Retryability
    • None
    • Client Backpressure 2
    • Java Drivers
    • Needed
    • Hide
      Please wait until the ticket is done. Only then will we be able to provide you with the documentation changes needed.

      1. What would you like to communicate to the user about this feature?
      2. Would you like the user to see examples of the syntax and/or executable code and its output?
      3. Which versions of the driver/connector does this apply to?
      Show
      Please wait until the ticket is done. Only then will we be able to provide you with the documentation changes needed. 1. What would you like to communicate to the user about this feature? 2. Would you like the user to see examples of the syntax and/or executable code and its output? 3. Which versions of the driver/connector does this apply to?
    • In Progress
    • None
    • 9
    • 12
    • 15
    • 100
    • 66
    • 🔴 Roadblock
    • Hide

      Engineer(s): Valentin Kovalenko, Slav Babanin
      2026-Apr-10:

      • Timeline updates
        • "Cost to Date" was not changed.
          • This is because neither of the engineers involved spent noticeable time on the epic.
        • "Final Cost Estimate" did not change.
        • Set "End date" from 2026-May-01 to 2026-Apr-10 + (Final Cost Estimate - Cost to Date) + 2 weeks = 2026-May-01 + (15 weeks - 12 weeks) + 2 weeks = 2026-May-15.
          • The (2026-Apr-10 + (Final Cost Estimate - Cost to Date) = 2026-May-01) part gives the estimate for when all the required code changes are done (though, not exactly, because "Final Cost Estimate" includes the cost of addressing reviews in the PRs created within this epic).
          • The (+ 2 weeks) part takes into account that there will be reviews by other people, which do not increase "Final Cost Estimate", but affects "End date".
        • "Confidence Status" stayed "Red".
          • Because Valentin Kovalenko extended "End date".
            The only potential path to "Yellow" is for the engineers involved to work on this epic undividedly. That is unrealistic, especially taking into account how much time it was possible to dedicate to the epic historically.
      • Any risks/blockers/impediments?
        • Yes. Valentin Kovalenko moved this previously existing risk from the "Timeline updates" section to this section.
          • It is not unlikely that Valentin Kovalenko will have to increase "Final Cost Estimate" in the future because the estimate turned out wrong:
            • 1 out of 3 weeks estimated for Valentin Kovalenko's work was spent on implementing prose tests, and more work on prose tests is needed due to the recent specification changes. It is unrealistic that Valentin Kovalenko can complete the rest of the work in the 2 weeks left estimated.
      Show
      Engineer(s): Valentin Kovalenko, Slav Babanin 2026-Apr-10: Timeline updates "Cost to Date" was not changed. This is because neither of the engineers involved spent noticeable time on the epic. "Final Cost Estimate" did not change. Set "End date" from 2026-May-01 to 2026-Apr-10 + (Final Cost Estimate - Cost to Date) + 2 weeks = 2026-May-01 + (15 weeks - 12 weeks) + 2 weeks = 2026-May-15. The (2026-Apr-10 + (Final Cost Estimate - Cost to Date) = 2026-May-01) part gives the estimate for when all the required code changes are done (though, not exactly, because "Final Cost Estimate" includes the cost of addressing reviews in the PRs created within this epic). The (+ 2 weeks) part takes into account that there will be reviews by other people, which do not increase "Final Cost Estimate", but affects "End date". "Confidence Status" stayed "Red". Because Valentin Kovalenko extended "End date". The only potential path to "Yellow" is for the engineers involved to work on this epic undividedly. That is unrealistic, especially taking into account how much time it was possible to dedicate to the epic historically. What was accomplished since the last update? Valentin Kovalenko Moved through review and merged: PR Add MongoException.SYSTEM_OVERLOADED_ERROR_LABEL/RETRYABLE_ERROR_LABEL #1926 . PR JAVA-6055 Implement prose backpressure retryable writes tests #1929 . Successfully investigated unexpected failures of the new tests. Slav Babanin Nothing within this epic. What's the focus over the next two weeks? Valentin Kovalenko Continue working on JAVA-5956: Exponential backoff and jitter in retry loops . Address the recent specification test changes by modifying / add tests to the commit Implement prose backpressure tests . A PR for this work cannot be created until PR JAVA-5950 Update Transactions Convenient API with exponential backoff on retries #1899 is merged. Start implementing driver logic. This is a roll over from the previous project report. Slav Babanin Add tests for JAVA-6105: Server selection deprioritization only for overload errors on replica sets and include them and the implementation in JAVA-6021,JAVA-6074,JAVA-6105,JAVA-6114: Add support for server selection's deprioritized servers to all topologies. #1860 once review of the earlier logic is complete. This is a roll over from the previous project report. Any risks/blockers/impediments? Yes. Valentin Kovalenko moved this previously existing risk from the "Timeline updates" section to this section. It is not unlikely that Valentin Kovalenko will have to increase "Final Cost Estimate" in the future because the estimate turned out wrong: 1 out of 3 weeks estimated for Valentin Kovalenko's work was spent on implementing prose tests, and more work on prose tests is needed due to the recent specification changes. It is unrealistic that Valentin Kovalenko can complete the rest of the work in the 2 weeks left estimated. Notes Some other, mostly backpressure-related work that done by Valentin Kovalenko Kept the description of JAVA-6019 Client Backpressure: overload retry policy up to date with all the changes in the backpressure project. Also updated the description of the PR JAVA-5950 Update Transactions Convenient API with exponential backoff on retries #1899 to reflect the new changes it should implement. Re-reviewed PR JAVA-5950 Update Transactions Convenient API with exponential backoff on retries #1899 . Created, moved through review, merged PR Fix com.mongodb.client.FailPoint.enable #1931 . Reviewed and participated in extensive discussions: PR DRIVERS-3427 - Finalize client backpressure implementation for phase 1 rollout #1919 . Reviewed PR DRIVERS-3436 - Refine withTransaction timeout error wrapping semantics and label propagation in spec and prose tests #1920 .
    • Hide

      2026-03-02 - 🔴 Roadblock
      Engineer(s): Valentin Kovalenko, Slav Babanin
      2026-01-12:

      • Timeline updates
        • The "End date", "Final Cost Estimate" updating instructions are unclear, especially for the "Final Cost Estimate" (https://wiki.corp.mongodb.com/spaces/DRIVERS/pages/214438893/DBX+Bi-Weekly+Project+Updates+Process#DBXBiWeeklyProjectUpdatesProcess-FieldstoUpdate).
          • Also, those instructions refer to "current estimates", "projected final cost", "target end date" - there are no such fields.
            There are "Scope Cost Estimate", "Final Cost Estimate", "Target end", "End date" fields, and it is unclear which of these are meant above, if any.
            • It would have been helpful if the instructions referred to the fields by their exact names, and used formatting that allows readers to distinguish field names from the rest of the text.
            • It may be a good idea to name all the fields such that the names help to understand the meaning, and also are consistent with each other.
          • The "Confidence Status" updating instruction "Confidence status is to be applied against the date reported in the End date field." seems incorrect:
            • The "End date" is allowed to change as the project progresses, to reflect the date by which we estimate to complete the project. Therefore, the instruction quoted above means that the "Confidence Status" should always be green, which is certainly not what the instruction is supposed to convey.
      • Any risks/blockers/impediments?
        • No.

          2026-02-18 - 🔴 Roadblock
          Engineer(s): Valentin Kovalenko, Slav Babanin
          2026-01-29:

      • Timeline updates
        • "Target end", "Scope Cost Estimate" will be updated again after discussing with the team.
      • Any risks/blockers/impediments?
        • No.

          2026-02-03 - 🔴 Roadblock
          Engineer(s): Valentin Kovalenko, Slav Babanin
          2026-01-29:

      • Timeline updates
        • "Target end", "Scope Cost Estimate" will be updated again after discussing with the team.
      • Any risks/blockers/impediments?
        • No.

          2026-02-03 - No confidence status provided
          Engineer(s): Valentin Kovalenko, Slav Babanin
          2026-01-29:

      • Timeline updates
        • "Target end", "Scope Cost Estimate" will be updated again after discussing with the team.
      • Any risks/blockers/impediments?
        • No.

          2026-02-01 - No confidence status provided
          Engineer(s): Valentin Kovalenko, Slav Babanin
          2026-01-29:

      • Timeline updates
        • "Target end", "Scope Cost Estimate" will be updated again after discussing with the team.
      • Any risks/blockers/impediments?
        • No.

          2026-01-07 - No confidence status provided
          Engineer(s): Valentin Kovalenko
          2026-01-01: "End date", "Final Cost Estimate" will be updated some time later after re-evaluating them with the team.

      • Timeline updates
        • See above.
      • Any risks/blockers/impediments?
        • The new estimates will be higher than our original estimates, both because of the scope changes and because of the changes in our understanding (though the latter can be seen a subset of the former).
        • The new estimates are likely to be done without having the specification work completed, let alone having the Java driver design for the overload retry policy completed.
            • 2026-01-07 - No confidence status provided
              Engineer(s): Valentin Kovalenko
              2026-01-01: "End date", "Final Cost Estimate" will be updated some time later after re-evaluating them with the team.

      • Timeline updates
        • See above.
      • Any risks/blockers/impediments?
        • The new estimates will be higher than our original estimates, both because of the scope changes and because of the changes in our understanding (though the latter can be seen a subset of the former).
        • The new estimates are likely to be done without having the specification work completed, let alone having the Java driver design for the overload retry policy completed.
      Show
      2026-03-02 - 🔴 Roadblock Engineer(s): Valentin Kovalenko, Slav Babanin 2026-01-12: Timeline updates The "End date", "Final Cost Estimate" updating instructions are unclear, especially for the "Final Cost Estimate" ( https://wiki.corp.mongodb.com/spaces/DRIVERS/pages/214438893/DBX+Bi-Weekly+Project+Updates+Process#DBXBiWeeklyProjectUpdatesProcess-FieldstoUpdate ). Also, those instructions refer to "current estimates", "projected final cost", "target end date" - there are no such fields. There are "Scope Cost Estimate", "Final Cost Estimate", "Target end", "End date" fields, and it is unclear which of these are meant above, if any. It would have been helpful if the instructions referred to the fields by their exact names, and used formatting that allows readers to distinguish field names from the rest of the text. It may be a good idea to name all the fields such that the names help to understand the meaning, and also are consistent with each other. The "Confidence Status" updating instruction "Confidence status is to be applied against the date reported in the End date field." seems incorrect: The "End date" is allowed to change as the project progresses, to reflect the date by which we estimate to complete the project. Therefore, the instruction quoted above means that the "Confidence Status" should always be green, which is certainly not what the instruction is supposed to convey. What was accomplished since the last update? Valentin Kovalenko: Finished reviewing DRIVERS-3239: Add exponential backoff to operation retry loop for server overloaded errors #1862 . Re-reviewed JAVA-5950: Update Transactions Convenient API with exponential backoff on retries #1852 . Partially reviewed JAVA-6021, JAVA-6074: Add support for server selection's deprioritized servers to all topologies. #1860 . Slav Babanin: Added unified tests to JAVA-6021, JAVA-6074: Add support for server selection's deprioritized servers to all topologies. #1860 . Reviewed DRIVERS-3391: Clarify withTransaction CSOT timeout error with backoff+jitter #1890 . What's the focus over the next two weeks? One of the two next weeks (the second one) I am the first responder. Finish reviewing JAVA-6021, JAVA-6074: Add support for server selection's deprioritized servers to all topologies. #1860 . Re-review JAVA-5950: Update Transactions Convenient API with exponential backoff on retries #1852 . Re-review DRIVERS-3391: Clarify withTransaction CSOT timeout error with backoff+jitter #1890 . Continue reviewing other relevant PRs as needed. No implementation work is expected. Any risks/blockers/impediments? No. 2026-02-18 - 🔴 Roadblock Engineer(s): Valentin Kovalenko, Slav Babanin 2026-01-29: Timeline updates "Target end", "Scope Cost Estimate" will be updated again after discussing with the team. What was accomplished since the last update? Valentin Kovalenko: Continued reviewing DRIVERS-3239: Add exponential backoff to operation retry loop for server overloaded errors . Three leads, the PR author and I have been having regular meetings. I believe, we are getting closer to a state at which I will be able to approve the requirements expressed in the PR. Re-reviewed PR [JAVA-5950 Update Transactions Convenient API with exponential backoff on retries] ( https://github.com/mongodb/mongo-java-driver/pull/1852 ). Slav Babanin: Worked on JAVA-6021: Add support for server selection's deprioritized servers to all topologies . Added additional test cases and addressed merge conflicts. What's the focus over the next two weeks? Continue reviewing DRIVERS-3239: Add exponential backoff to operation retry loop for server overloaded errors , hopefully approve. If DRIVERS-3239: Add exponential backoff to operation retry loop for server overloaded errors is done and ready to be implemented, continue working on JAVA-6025: Client Backpressure: design overload retry policy concurrently with JAVA-5956: Exponential backoff and jitter in retry loops . Work on JAVA-6021: Add support for server selection's deprioritized servers to all topologies . Resolve new merge conflicts and incorporate spec test changes. Continue reviewing the existing JAVA PRs done as part of JAVA-5942: Client Backpressure , JAVA-6019: Client Backpressure: overload retry policy as needed. Any risks/blockers/impediments? No. 2026-02-03 - 🔴 Roadblock Engineer(s): Valentin Kovalenko, Slav Babanin 2026-01-29: Timeline updates "Target end", "Scope Cost Estimate" will be updated again after discussing with the team. What was accomplished since the last update? Valentin Kovalenko: Continued reviewing DRIVERS-3239: Add exponential backoff to operation retry loop for server overloaded errors . Three leads, the PR author and I have been having regular meetings. I believe, we are getting closer to a state at which I will be able to approve the requirements expressed in the PR. Re-reviewed PR [JAVA-5950 Update Transactions Convenient API with exponential backoff on retries] ( https://github.com/mongodb/mongo-java-driver/pull/1852 ). Slav Babanin: Worked on JAVA-6021: Add support for server selection's deprioritized servers to all topologies . Added additional test cases and addressed merge conflicts. What's the focus over the next two weeks? Continue reviewing DRIVERS-3239: Add exponential backoff to operation retry loop for server overloaded errors , hopefully approve. If DRIVERS-3239: Add exponential backoff to operation retry loop for server overloaded errors is done and ready to be implemented, continue working on JAVA-6025: Client Backpressure: design overload retry policy concurrently with JAVA-5956: Exponential backoff and jitter in retry loops . Work on JAVA-6021: Add support for server selection's deprioritized servers to all topologies . Resolve new merge conflicts and incorporate spec test changes. Continue reviewing the existing JAVA PRs done as part of JAVA-5942: Client Backpressure , JAVA-6019: Client Backpressure: overload retry policy as needed. Any risks/blockers/impediments? No. 2026-02-03 - No confidence status provided Engineer(s): Valentin Kovalenko, Slav Babanin 2026-01-29: Timeline updates "Target end", "Scope Cost Estimate" will be updated again after discussing with the team. What was accomplished since the last update? Valentin Kovalenko: Continued reviewing DRIVERS-3239: Add exponential backoff to operation retry loop for server overloaded errors . Three leads, the PR author and I have been having regular meetings. I believe, we are getting closer to a state at which I will be able to approve the requirements expressed in the PR. Re-reviewed PR [JAVA-5950 Update Transactions Convenient API with exponential backoff on retries] ( https://github.com/mongodb/mongo-java-driver/pull/1852 ). Slav Babanin: Worked on JAVA-6021: Add support for server selection's deprioritized servers to all topologies . Added additional test cases and addressed merge conflicts. What's the focus over the next two weeks? Continue reviewing DRIVERS-3239: Add exponential backoff to operation retry loop for server overloaded errors , hopefully approve. If DRIVERS-3239: Add exponential backoff to operation retry loop for server overloaded errors is done and ready to be implemented, continue working on JAVA-6025: Client Backpressure: design overload retry policy concurrently with JAVA-5956: Exponential backoff and jitter in retry loops . Work on JAVA-6021: Add support for server selection's deprioritized servers to all topologies . Resolve new merge conflicts and incorporate spec test changes. Continue reviewing the existing JAVA PRs done as part of JAVA-5942: Client Backpressure , JAVA-6019: Client Backpressure: overload retry policy as needed. Any risks/blockers/impediments? No. 2026-02-01 - No confidence status provided Engineer(s): Valentin Kovalenko, Slav Babanin 2026-01-29: Timeline updates "Target end", "Scope Cost Estimate" will be updated again after discussing with the team. What was accomplished since the last update? Valentin Kovalenko: Continued reviewing DRIVERS-3239: Add exponential backoff to operation retry loop for server overloaded errors . Three leads, the PR author and I have been having regular meetings. I believe, we are getting closer to a state at which I will be able to approve the requirements expressed in the PR. Re-reviewed PR [JAVA-5950 Update Transactions Convenient API with exponential backoff on retries] ( https://github.com/mongodb/mongo-java-driver/pull/1852 ). Slav Babanin: Worked on JAVA-6021: Add support for server selection's deprioritized servers to all topologies . Added additional test cases and addressed merge conflicts. What's the focus over the next two weeks? Continue reviewing DRIVERS-3239: Add exponential backoff to operation retry loop for server overloaded errors , hopefully approve. If DRIVERS-3239: Add exponential backoff to operation retry loop for server overloaded errors is done and ready to be implemented, continue working on JAVA-6025: Client Backpressure: design overload retry policy concurrently with JAVA-5956: Exponential backoff and jitter in retry loops . Work on JAVA-6021: Add support for server selection's deprioritized servers to all topologies . Resolve new merge conflicts and incorporate spec test changes. Continue reviewing the existing JAVA PRs done as part of JAVA-5942: Client Backpressure , JAVA-6019: Client Backpressure: overload retry policy as needed. Any risks/blockers/impediments? No. 2026-01-07 - No confidence status provided Engineer(s): Valentin Kovalenko 2026-01-01: "End date", "Final Cost Estimate" will be updated some time later after re-evaluating them with the team. Timeline updates See above. What was accomplished since the last update? Continued reviewing DRIVERS-3239: Add exponential backoff to operation retry loop for server overloaded errors . The PR has been actively updated. Reviewed the existing non-draft JAVA PRs done as part of Client Backpressure . JAVA-5950 Update Transactions Convenient API with exponential backoff on retries [JAVA-6033] ServerHeartbeatSucceededEvent is not fired for initial POLL monitoring #1856 What's the focus over the next two weeks? Continue reviewing DRIVERS-3239: Add exponential backoff to operation retry loop for server overloaded errors . Continue reviewing the existing JAVA PRs done as part of Client Backpressure as needed. Finish JAVA-6025: Client Backpressure: design overload retry policy , provided that the relevant spec requirements are approved, merged, and there is enough time left within the two weeks to finish the design. Any risks/blockers/impediments? The new estimates will be higher than our original estimates, both because of the scope changes and because of the changes in our understanding (though the latter can be seen a subset of the former). The new estimates are likely to be done without having the specification work completed, let alone having the Java driver design for the overload retry policy completed. 2026-01-07 - No confidence status provided Engineer(s): Valentin Kovalenko 2026-01-01: "End date", "Final Cost Estimate" will be updated some time later after re-evaluating them with the team. Timeline updates See above. What was accomplished since the last update? Continued reviewing DRIVERS-3239: Add exponential backoff to operation retry loop for server overloaded errors . The PR has been actively updated. Reviewed the existing non-draft JAVA PRs done as part of Client Backpressure . JAVA-5950 Update Transactions Convenient API with exponential backoff on retries [JAVA-6033] ServerHeartbeatSucceededEvent is not fired for initial POLL monitoring #1856 What's the focus over the next two weeks? Continue reviewing DRIVERS-3239: Add exponential backoff to operation retry loop for server overloaded errors . Continue reviewing the existing JAVA PRs done as part of Client Backpressure as needed. Finish JAVA-6025: Client Backpressure: design overload retry policy , provided that the relevant spec requirements are approved, merged, and there is enough time left within the two weeks to finish the design. Any risks/blockers/impediments? The new estimates will be higher than our original estimates, both because of the scope changes and because of the changes in our understanding (though the latter can be seen a subset of the former). The new estimates are likely to be done without having the specification work completed, let alone having the Java driver design for the overload retry policy completed.
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      This epic exists solely for reporting the work done by valentin.kovalenko@mongodb.com; update: and slav.babanin@mongodb.com.

      Specification changes

      Scope

      The driver specification changes this epic should implement are a subset of those introduced by DRIVERS-3160 that is referred to as "New retry logic, with token bucket" in the "Backpressure Estimate" document. It is difficult to tell what tickets constitute that subset:

      Everything related

      The last commit I checked at https://github.com/mongodb/specifications/commits/master/ is 8a8a7c5, the last open PR I checked at https://github.com/mongodb/specifications/pulls is #1920.

      Design

            Assignee:
            Valentin Kavalenka
            Reporter:
            Valentin Kavalenka
            None
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              22 weeks, 3 days
              None
              None