ShardingDDLCoordinatorService's check for ongoing coordinators should not depend on asyncronous cleanup work

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Fixed
    • Priority: Major - P3
    • 8.3.0-rc0
    • Affects Version/s: 8.3.0-rc0, 8.2.0
    • Component/s: None
    • None
    • Catalog and Routing
    • Fully Compatible
    • ALL
    • v8.2
    • CAR Team 2025-09-29, CAR Team 2025-10-13
    • 0
    • 🟥 DDL
    • None
    • None
    • None
    • None
    • None
    • None

      Sharded DDL commands generally create a ShardingDDLCoordinator, then block until completion by waiting on its completion future. After the command gets unblocked, it can return the result back to the user which may run follow-up commands with the expectation that the operation is fully complete.

      However, the de-activation of the ShardingDDLCoordinator from the ShardingDDLCoordinatorService concurrently waits on this same future, and there is no guarantee that it gets scheduled on a timely manner. The function areAllCoordinatorsOfTypeFinished, introduced in 8.2 checks the in memory state of the ShardingDDLCoordinatorService rather than any state which is guaranteed to be updated once the completionFuture is ready.

      Because of this, code which uses areAllCoordinatorsOfTypeFinished may see a coordinator as existing after the completion future is ready, causing unexpected conflicts when running commands sequentially (as a practical matter, this only reproduces if commands, e.g. addShard then setFCV, are run immediately one after the other, e.g. via a script).

            Assignee:
            Allison Easton
            Reporter:
            Joan Bruguera Micó
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: