Summary
TLA+ spec + jstest for SERVER-100155 (CAR-impact 5): a stepdown-interleave lets a migration commit AFTER balancerStop returns OK. Breaks every operator workflow that runs mongodump / mongosync after stopping the balancer — including documented backup procedures.
Files
- src/mongo/tla_plus/Sharding/BalancerStopRace/BalancerStopRace.tla (306 lines) — composes balancer-coordinator × config-stepdown × migration-commit-path. AllowStepdownInterleave bug toggle; ghost counter commitsAfterStopOk; invariant StopThenObserveStableState. Migration table modelled as a fixed-arity function [1..MaxMigrations -> {unset, commitInFlight, committed, aborted\}] rather than a sequence, to keep state space tractable.
- MCBalancerStopRace.tla / .cfg (green) + _Bug.cfg
- OWNERS.yml → 10gen/server-sharding
- README.md (295 words)
- jstests/sharding/balancer_stop_no_round_after_return.js (196 lines) — pins donor at hangBeforeSendingCommitDecision, fires configRS.stepUp mid-commit, runs balancerStop in a parallel shell that stamps the OK timestamp into a sentinel doc, releases the failpoint, then asserts every chunk's history[].validAfter predates that OK timestamp.
TLC verdicts
- Green: 373 states / 197 distinct / depth 15 / "No error has been found".
- Bug: Invariant StopThenObserveStableState is violated with predicted 8-state counterexample BeginRound → DispatchMigration → StepDown → EndRoundOnInterrupt → BalancerStopBegin → BalancerStopReturnOk → ShardCommitsMigration; terminal commitsAfterStopOk = 1.
Key insight surfaced by the spec
The bug isn't strictly that joinCurrentRound returns early — it's that _endRound clears _inBalancerRound on the InterruptedDueToReplStateChange catch path without draining migrations whose future-chain has already been forked to the shard. The spec-implied fix: _endRound on the interrupt branch must wait for the command scheduler's in-flight migrations to reach a terminal committed | aborted state.
- is related to
-
SERVER-100155 Balancer round may complete after {balancerStop:1} in presence of stepdowns
-
- Backlog
-