TLA+ spec + regression jstest for fsyncLock leaving durableOpTime stuck behind lastWritten (SERVER-126254)

    • Type: Task
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Summary

      TLA+ spec capturing the stable wedge SERVER-126254 documents — when fsyncLock holds Global S, nothing on the primary can advance durableOpTime even though lastWritten can still advance via concurrent w:1 j:false writes — plus a regression jstest promoting the ticket's attached repro.

      Files

      • src/mongo/tla_plus/Replication/FsyncLockDurableOpTime/FsyncLockDurableOpTime.tla (203 lines) — 3 state vars (lastWritten, durableOpTime, globalSLockHeld), 4 actions (CommitWUOW, AcquireGlobalS, ReleaseGlobalS, AdvanceDurable). BugEnabled gates AdvanceDurable on ~globalSLockHeld.
      • MCFsyncLockDurableOpTime. {tla,cfg} (green, BugEnabled=FALSE)
        * MCFsyncLockDurableOpTimeBug.{tla,cfg}

        (bug, BugEnabled=TRUE, drops WF on ReleaseGlobalS so the wedge is observable)

      • OWNERS.yml → 10gen/server-replication-reviewers
      • README.md (300 words)
      • jstests/replsets/fsync_lock_durable_optime_stuck.js (155 lines) — pauseJournalFlusherBeforeFlush failpoint → w:1 j:false insert → assert durable < written → fsyncLock → release failpoint → assert.soon durable catches up within 30s → fsyncUnlock

      TLC verdicts

      • Green: 12 distinct states, no errors, completes in <1s.
      • Bug: temporal property violated with the textbook counterexample — CommitWUOW × 2, AdvanceDurable, AcquireGlobalS, stutter at (lastWritten=2, durableOpTime=1, globalSLockHeld=TRUE). Matches the ticket's stable equilibrium step-for-step.

            Assignee:
            Unassigned
            Reporter:
            Mehar Grewal
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: