TLA+ spec + jstest for resharding kAbortedWithoutPrepare silent-loss on restart (SERVER-125589)

    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Summary

      TLA+ spec + jstest for SERVER-125589: a kStartOrContinue restart from kAbortedWithoutPrepare can resume past data that was never committed. Disallowing this transition prevents silent data loss.

      Files

      • src/mongo/tla_plus/Sharding/ReshardingAbortRestart/ReshardingAbortRestart.tla (321 lines) — TransactionParticipant state subset (kNone / kInProgress / kAbortedWithoutPrepare / kCommitted), client + recovery-thread (two-block txn_two_phase_commit_cmds.cpp recovery path with gap) + sub-router actions. Bug toggle AllowResumeFromAbortedWithoutPrepare. Invariants NoCommitFromAbortedWithoutPrepare, RecoveryPathDoesNotCrash, TerminalMonotonicity.
      • MCReshardingAbortRestart. {tla,cfg}

        (green, =FALSE). TLC: 66 states, 40 distinct, no error.

      • MCReshardingAbortRestart_bug.cfg (bug, =TRUE). TLC: counterexample in 5 states — begin w2 → abort → sub-router restart with w1 → commit; submitted= {w1,w2}

        , stagedForCommit=

        {w1}

        silent loss.

      • OWNERS.yml → 10gen/server-sharding
      • README.md (279 words)
      • jstests/sharding/resharding_no_resume_from_aborted_without_prepare.js (161 lines) — single-shard, single-participant: insert → abortTransaction → replay (lsid, 0) with startOrContinueTransaction: true and assert.commandFailed; confirm no docs persisted; confirm a fresh txnNumber: 1 commits cleanly.

      Scope note

      Models the TransactionParticipant state machine (root defect from SERVER-85170, surfaced on the coordinator/resharding recovery path via AF-15971). kPrepared/2PC branches and txnRetryCounter (kAbortedOnFirstStatement proposal) intentionally elided.

            Assignee:
            Unassigned
            Reporter:
            Mehar Grewal
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: