TLA+ spec + jstest for stepdown × PDIB deadlock-freedom (SERVER-126266)

    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Summary

      TLA+ spec + jstest for SERVER-126266 — possible deadlock between stepdown and primary-driven index build (PDIB). Load-bearing for the 9.0 PDIB release.

      Files

      • src/mongo/tla_plus/IndexBuilds/StepdownPDIBDeadlock/StepdownPDIBDeadlock.tla (378 lines) — three abstract threads (PDIB, Stepdown, BgSync) contend on one RSTL (fair queue) plus a commit-quorum vote set whose arrival is gated on BgSync holding IX. PDIB walks idle → scanning → voting → committing → done/aborted. WaitsFor captures RSTL-queue blocking + PDIB-waiting-on-BgSync commit-quorum starvation; HasCycle enumerates length-2 and length-3 cycles directly.
      • MC + green cfg + bug cfg with deadlock check on
      • OWNERS.yml → 10gen/server-replication-reviewers
      • README.md (298 words)
      • jstests/replsets/stepdown_during_pdib_no_deadlock.js (148 lines) — uses hangIndexBuildBeforeSignalPrimaryForCommitReadiness to park PDIB, fires non-force replSetStepDown, asserts completion under 60s, verifies the rs is not wedged.

      Invariants

      TypeOK, RSTLWellFormed, DeadlockFreedom (no wait-for cycle), EveryStepdownCompletes (liveness, ~>).

      TLC verdicts

      • Green: 268 states / 133 distinct / depth 12 — no error.
      • Bug: Error: Deadlock reached at depth 5 — counterexample PDIBStart → StepdownStart → PDIBFinishScan → BgSyncEnqueue then no enabled action, matching the SERVER-126266 wait-for cycle exactly.

            Assignee:
            Unassigned
            Reporter:
            Mehar Grewal
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: