TLA+ spec + jstest for setFCV no-op transition waiting on stale opTime (SERVER-120978)

    • Type: Task
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Summary

      TLA+ spec + jstest for SERVER-120978: when setFCV is a no-op transition (already at target), the majority-commit wait uses a stale opTime, so the command can return before the FCV change is durably committed. Subtle but load-bearing in mixed-version fleets.

      Files

      • src/mongo/tla_plus/Replication/SetFCVNoOpMajorityWait/SetFCVNoOpMajorityWait.tla (289 lines) — FCV transition state machine. Actions: PersistFCVWrite, ReplicateFCV, AdvanceCommit, Stepdown, ElectNew, SetFCVNoOp, SetFCVWrite. Invariant SetFCVResultReflectsDurableState. Bug toggle AllowStaleOpTimeOnNoOp switches between the buggy clientLastOp wait and the fix's SystemLastOpTime pin (the setLastOpToSystemLastOpTime line at set_feature_compatibility_version_command.cpp:392).
      • MCSetFCVNoOpMajorityWait. {tla,cfg}
      • OWNERS.yml → 10gen/server-replication-reviewers
      • README.md (275 words)
      • jstests/replsets/set_fcv_noop_returns_before_durable.js (137 lines) — pauses replication via stopReplProducer, forces a replSetStepDown mid-FCV-write, then majority-reads the FCV doc on every node after a no-op setFCV reports ok:1.

      TLC verdicts

      • Green (AllowStaleOpTimeOnNoOp = FALSE): 1,740,696 states, 0 violations, 5–7s.
      • Bug (=TRUE): invariant violated at depth 6 with 4-state trace — two PersistFCVWrites push SystemLastOpTime to 2 while clientLastOp stays at 0; SetFCVNoOp waits on 0 and returns.

            Assignee:
            Unassigned
            Reporter:
            Mehar Grewal
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: