Always pass cancelation token from DDL coordinator to RPC for participant commands

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Fixed
    • Priority: Major - P3
    • 8.3.0-rc0
    • Affects Version/s: 8.3.0-rc0
    • Component/s: Catalog, Networking
    • None
    • Catalog and Routing
    • Fully Compatible
    • ALL
    • CAR Team 2026-03-16
    • 200
    • 🟥 DDL
    • None
    • None
    • None
    • None
    • None
    • None

      Sharding DDL coordinators sometimes use CancellationToken::uncancelable() when sending a participant command to itself or another shard, instead of the one associated to the coordinator (for example, when DropCollectionCoordinator sends a _shardsvrDropCollectionParticipant command).

       

      This can cause the DDL coordinators to not shut down quickly after a step down. This is particularly problematic if:

      • On the stepdown the DDL coordinator was sending a participant command to itself with an uncancelable token.
      • The same node that just stepped down wins the election.

      This causes a circular wait (the DDL coordinator has an uncancelable RPC waiting for a primary but the node can't step up as primary until the DDL coordinators from the previous term finish) which only resolves once the ReplicaSetMonitor times out.

       

      We should always use the real cancelation token instead of CancellationToken::uncancelable().

            Assignee:
            Joan Bruguera Micó
            Reporter:
            Joan Bruguera Micó
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: