Fail gracefully if authoritative collection refresh is interrupted by stepdown

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: 9.0.0-rc0
    • Component/s: Catalog, Replication, Sharding
    • None
    • Catalog and Routing
    • CAR Team 2026-05-11
    • 0
    • 🟦 Shard Catalog
    • None
    • None
    • None
    • None
    • None
    • None

      Fix two issues that can cause a tassert or crash if a stepdown happens during authoritative collection metadata refresh:

      1. If a stepdown interrupts the refresh, the refresh may swallow any error code and keep retrying. Eventually it will hit the retry limit and tassert.
        • We should stop retrying if the opCtx has been interrupted.
      2. On stepdown the replication coordinator interrupts all optime waiters and then invariants that none remain. However we are missing those registered by ReplicationCoordinator::registerWaiterForMajorityReadOpTime (which are used when recovering the authoritative metadata from disk).
        • We should also interrupt those optime waiters.

            Assignee:
            Joan Bruguera Micó
            Reporter:
            Joan Bruguera Micó
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: