Investigate find_one_and_update regressions due to SERVER-124159

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Checkpoints
    • None
    • Storage Engines - Server Integration
    • 297.516
    • SE Persistence backlog
    • None

      SERVER-124519 enabled parallel checkpoints for disaggregated storage. By some metrics, this was overall an improvement (20 improvements/15 regressions), however some of the regressions are quite bad (on the order of 500%).

      While there was some discussion of this at the time the ticket was merged, it didn't actually say why this is OK. There was also some speculation that it was just noise, but this has turned out not to be the case – these 500% regressions have proven to be "sticky":

      The good news is that all of the big regressions came from just the find_one_and_update tasks, and only on the 11-node disagg variant. We should investigate what's going on with this particular task and at least have a reasonable explanation, and ideally a fix.

        1. screenshot-3.png
          screenshot-3.png
          113 kB
        2. Screenshot 2026-05-29 at 7.21.09 pm.png
          Screenshot 2026-05-29 at 7.21.09 pm.png
          128 kB
        3. Screenshot_20260519_162352.png
          Screenshot_20260519_162352.png
          39 kB
        4. pc_enabled.png
          pc_enabled.png
          452 kB
        5. pc_disabled.png
          pc_disabled.png
          439 kB
        6. parallel_ckpt_enabled_vs_disabled.t2
          6 kB
        7. image-2026-05-29-20-32-22-951.png
          image-2026-05-29-20-32-22-951.png
          182 kB
        8. image-2026-05-29-20-13-03-298.png
          image-2026-05-29-20-13-03-298.png
          135 kB
        9. image-2026-05-29-20-08-42-154.png
          image-2026-05-29-20-08-42-154.png
          300 kB
        10. image-2026-05-29-20-08-03-194.png
          image-2026-05-29-20-08-03-194.png
          288 kB
        11. image-2026-05-29-20-02-31-544.png
          image-2026-05-29-20-02-31-544.png
          48 kB
        12. image-2026-05-29-20-01-17-134.png
          image-2026-05-29-20-01-17-134.png
          33 kB
        13. image-2026-05-29-19-33-30-189.png
          image-2026-05-29-19-33-30-189.png
          89 kB
        14. image-2026-05-29-19-11-07-055.png
          image-2026-05-29-19-11-07-055.png
          86 kB
        15. image-2026-05-29-19-08-05-308.png
          image-2026-05-29-19-08-05-308.png
          99 kB
        16. image-2026-05-29-08-57-35-675.png
          image-2026-05-29-08-57-35-675.png
          90 kB
        17. image-2026-05-27-16-34-40-975.png
          image-2026-05-27-16-34-40-975.png
          40 kB
        18. image-2026-05-27-16-32-22-554.png
          image-2026-05-27-16-32-22-554.png
          419 kB
        19. image-2026-05-27-16-16-16-887.png
          image-2026-05-27-16-16-16-887.png
          90 kB
        20. image-2026-05-27-16-14-57-590.png
          image-2026-05-27-16-14-57-590.png
          84 kB
        21. image-2026-05-27-16-08-26-333.png
          image-2026-05-27-16-08-26-333.png
          160 kB
        22. image-2026-05-27-16-06-49-818.png
          image-2026-05-27-16-06-49-818.png
          202 kB

            Assignee:
            [DO NOT USE] Backlog - Storage Engines Team
            Reporter:
            Will Korteland
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: