-
Type:
Task
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Summary
TLA+ spec + jstest for SERVER-125589: a kStartOrContinue restart from kAbortedWithoutPrepare can resume past data that was never committed. Disallowing this transition prevents silent data loss.
Files
- src/mongo/tla_plus/Sharding/ReshardingAbortRestart/ReshardingAbortRestart.tla (321 lines) — TransactionParticipant state subset (kNone / kInProgress / kAbortedWithoutPrepare / kCommitted), client + recovery-thread (two-block txn_two_phase_commit_cmds.cpp recovery path with gap) + sub-router actions. Bug toggle AllowResumeFromAbortedWithoutPrepare. Invariants NoCommitFromAbortedWithoutPrepare, RecoveryPathDoesNotCrash, TerminalMonotonicity.
- MCReshardingAbortRestart.
{tla,cfg}
(green, =FALSE). TLC: 66 states, 40 distinct, no error.
- MCReshardingAbortRestart_bug.cfg (bug, =TRUE). TLC: counterexample in 5 states — begin w2 → abort → sub-router restart with w1 → commit; submitted=
{w1,w2}
, stagedForCommit=
{w1}silent loss.
- OWNERS.yml → 10gen/server-sharding
- README.md (279 words)
- jstests/sharding/resharding_no_resume_from_aborted_without_prepare.js (161 lines) — single-shard, single-participant: insert → abortTransaction → replay (lsid, 0) with startOrContinueTransaction: true and assert.commandFailed; confirm no docs persisted; confirm a fresh txnNumber: 1 commits cleanly.
Scope note
Models the TransactionParticipant state machine (root defect from SERVER-85170, surfaced on the coordinator/resharding recovery path via AF-15971). kPrepared/2PC branches and txnRetryCounter (kAbortedOnFirstStatement proposal) intentionally elided.
- is related to
-
SERVER-85170 Implement shard's handling of startOrContinueTransaction field
-
- Closed
-