Combined TLA+ spec for resharding × FCV downgrade orphan family (SERVER-111230 + SERVER-92437 + SERVER-121914)

    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Summary

      Single combined TLA+ spec covering SERVER-111230 + SERVER-92437 + SERVER-121914 — the family of orphan/temp-collection bugs arising when setFCV downgrade interleaves with in-flight reshardCollection or moveChunk. All three share the skeleton "multi-phase write across catalog + per-shard rangeDeletions, interrupted by FCV abort, recovery hook elides cleanup."

      Files

      • src/mongo/tla_plus/Sharding/ReshardingFCVDowngradeOrphans/ReshardingFCVDowngradeOrphans.tla (401 lines) — FCV × reshard donor/recipient × migration-coordinator × range-deletion product. Three bug toggles (BugTempCollLeak, BugRangeDelLeak, BugStaleChunks) + four invariants (the three ticket-specific ones plus IndexBuildDecisionFCVConsistent via a mixedFcvCommit ghost variable that captures the SERVER-92437 mixed-FCV commit fingerprint at commit-time).
      • MCReshardingFCVDowngradeOrphans.tla + green cfg + _Bug111230.cfg + _Bug92437.cfg + _Bug121914.cfg (one per ticket)
      • OWNERS.yml → 10gen/server-sharding
      • README.md (61 lines)
      • jstests/sharding/resharding_fcv_downgrade_orphan_cleanup.js (260 lines) — three repro blocks driving the three failure modes via reshardingPauseDonorBeforeDonatingInitialData, reshardingPauseRecipientDuringCloning, and hangBeforeRemovingRangeDeletionEntry failpoints.

      TLC verdicts

      • Green: No error has been found (370 distinct states, ~13s).
      • Bug111230: Invariant TempCollectionMetadataClearedOnFCVAbort is violated.
      • Bug92437: Invariant RangeDeletionsClearedOnReshardAbort is violated.
      • Bug121914: Invariant NoStaleConfigChunksAfterFCVDowngrade is violated.

      Key model decision

      Ghost variable mixedFcvCommit captures the SERVER-92437 fingerprint (recipient committing while FCV=DOWNGRADING with indexBuildSkipped=TRUE) at the action that creates the anomaly rather than as a post-hoc state predicate. Without this, TLC flagged the benign "FCV downgraded after recipient finished" sequence as a violation.

            Assignee:
            Unassigned
            Reporter:
            Mehar Grewal
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: