TLA+ spec + jstest: standby loop must not race becomePrimary on step-up

    • Type: Task
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Companion artifact for SERVER-126278 (pali-chaos — standby loop concurrent with becomePrimary causing gRPC fatal).

      Root cause: `stepUpIfEligible()` does cancel+join BEFORE RSTL; RSTL wait window allows queued `accept(kDisconnected)` to fire on another thread → `reconnectAsSecondary()` → new standby loop → fatal gRPC `CallOpRecvInitialMetadata` in `ExternalUnref`. TLC produces 7-step counter-example.

      Files in worktree mongo-w4-20:

      • src/mongo/tla_plus/Replication/StandbyStepUpRace/{*.tla, MC.cfg, MC_bug.cfg, OWNERS.yml}
      • jstests/replsets/standby_loop_does_not_race_become_primary.js

      Status: Draft.

            Assignee:
            Unassigned
            Reporter:
            Mehar Grewal
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: