Add TLA+ spec for change-stream resume-token cursor arming

    • Type: Task
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      The existing src/mongo/tla_plus/ corpus covers consensus (RaftMongo), reconfiguration, chunk migration, retryable txns moving with chunks, range deletions on secondaries, and ticketed concurrency. It does not cover any aspect of the change-stream cursor lifecycle.

      This ticket proposes adding a new TLA+ specification at src/mongo/tla_plus/ChangeStreams/ResumeTokenArming/ that models the well-known lazy-cursor-arming hazard: when a driver implements db.coll.watch() lazily and does not pin the cursor with startAtOperationTime, the server-side cursor's open point becomes the time of the first getMore, not the time watch() returned to the caller. Any change event with a clusterTime in the open window [client_watch_returned, first_getMore_arrives_at_server) is silently dropped.

      The proposed spec parameterizes the model with a boolean UseStartAt and shows by TLC model-checking that:

      • With UseStartAt = FALSE, the NoEventLoss invariant is violated by a short counterexample (~depth 9, 285 states).
      • With UseStartAt = TRUE, TLC completes clean (629 distinct states, depth 11).

      This gives the change-streams team a formal artifact pinning the cursor-arming invariant that drivers must satisfy, and a machine-checked counterexample that drivers (or future server-side cursor refactors) can be tested against.

      Scope

      • Pure addition to src/mongo/tla_plus/ChangeStreams/ResumeTokenArming/:
        • ResumeTokenArming.tla (~326 lines)
        • MCResumeTokenArming.tla (~31 lines)
        • MCResumeTokenArming.cfg (~26 lines)
        • OWNERS.yml
        • README.md
      • No C++ source changes.
      • No test changes outside the TLA+ corpus.

      Run

      cd src/mongo/tla_plus
      ./download-tlc.sh    # one-time; pulls tla2tools.jar
      ./model-check.sh ChangeStreams/ResumeTokenArming
      

      Default cfg ships with UseStartAt = FALSE to make the counterexample visible to the reviewer. Toggle to TRUE to verify the fixed-mode invariance.

      Acceptance criteria

      • Files land under src/mongo/tla_plus/ChangeStreams/ResumeTokenArming/.
      • ./model-check.sh ChangeStreams/ResumeTokenArming produces the documented UseStartAt=FALSE counterexample.
      • Owners file routes the spec to the appropriate team.
      • README is self-contained for a kernel engineer with no prior change-streams context.

      Related work

      No prior TLA+ spec covers change streams in this repo. Adjacent in spirit:

      • src/mongo/tla_plus/Replication/RaftMongo/ — uses the same actor/oplog idiom.
      • src/mongo/tla_plus/Sharding/RangeDeletionsSecondaryNodes/ — uses the same background-actor / consumer idiom.

      Implementation status

      Draft of all five files authored and TLC-verified locally. Forbidden-term review complete. PR to follow on this ticket.

            Assignee:
            Unassigned
            Reporter:
            Mehar Grewal
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: