-
Type:
Task
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Summary
Single combined TLA+ spec covering SERVER-111230 + SERVER-92437 + SERVER-121914 — the family of orphan/temp-collection bugs arising when setFCV downgrade interleaves with in-flight reshardCollection or moveChunk. All three share the skeleton "multi-phase write across catalog + per-shard rangeDeletions, interrupted by FCV abort, recovery hook elides cleanup."
Files
- src/mongo/tla_plus/Sharding/ReshardingFCVDowngradeOrphans/ReshardingFCVDowngradeOrphans.tla (401 lines) — FCV × reshard donor/recipient × migration-coordinator × range-deletion product. Three bug toggles (BugTempCollLeak, BugRangeDelLeak, BugStaleChunks) + four invariants (the three ticket-specific ones plus IndexBuildDecisionFCVConsistent via a mixedFcvCommit ghost variable that captures the SERVER-92437 mixed-FCV commit fingerprint at commit-time).
- MCReshardingFCVDowngradeOrphans.tla + green cfg + _Bug111230.cfg + _Bug92437.cfg + _Bug121914.cfg (one per ticket)
- OWNERS.yml → 10gen/server-sharding
- README.md (61 lines)
- jstests/sharding/resharding_fcv_downgrade_orphan_cleanup.js (260 lines) — three repro blocks driving the three failure modes via reshardingPauseDonorBeforeDonatingInitialData, reshardingPauseRecipientDuringCloning, and hangBeforeRemovingRangeDeletionEntry failpoints.
TLC verdicts
- Green: No error has been found (370 distinct states, ~13s).
- Bug111230: Invariant TempCollectionMetadataClearedOnFCVAbort is violated.
- Bug92437: Invariant RangeDeletionsClearedOnReshardAbort is violated.
- Bug121914: Invariant NoStaleConfigChunksAfterFCVDowngrade is violated.
Key model decision
Ghost variable mixedFcvCommit captures the SERVER-92437 fingerprint (recipient committing while FCV=DOWNGRADING with indexBuildSkipped=TRUE) at the action that creates the anomaly rather than as a post-hoc state predicate. Without this, TLC flagged the benign "FCV downgraded after recipient finished" sequence as a violation.
- is related to
-
SERVER-92437 Problems with resharding operation that starts while the FCV is "8.0" or "7.3" and commits while the FCV is "Downgrading to 7.0"
-
- Open
-
-
SERVER-111230 reshardCollection may fail to clean up metadata on its temporary collection when aborted
-
- Backlog
-
-
SERVER-121914 Downgrade to FCV 8.0 can leave uncleaned orphan documents
-
- Investigating
-