-
Type:
Task
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Summary
TLA+ spec capturing the stable wedge SERVER-126254 documents — when fsyncLock holds Global S, nothing on the primary can advance durableOpTime even though lastWritten can still advance via concurrent w:1 j:false writes — plus a regression jstest promoting the ticket's attached repro.
Files
- src/mongo/tla_plus/Replication/FsyncLockDurableOpTime/FsyncLockDurableOpTime.tla (203 lines) — 3 state vars (lastWritten, durableOpTime, globalSLockHeld), 4 actions (CommitWUOW, AcquireGlobalS, ReleaseGlobalS, AdvanceDurable). BugEnabled gates AdvanceDurable on ~globalSLockHeld.
- MCFsyncLockDurableOpTime.
{tla,cfg} (green, BugEnabled=FALSE)
* MCFsyncLockDurableOpTimeBug.{tla,cfg}(bug, BugEnabled=TRUE, drops WF on ReleaseGlobalS so the wedge is observable)
- OWNERS.yml → 10gen/server-replication-reviewers
- README.md (300 words)
- jstests/replsets/fsync_lock_durable_optime_stuck.js (155 lines) — pauseJournalFlusherBeforeFlush failpoint → w:1 j:false insert → assert durable < written → fsyncLock → release failpoint → assert.soon durable catches up within 30s → fsyncUnlock
TLC verdicts
- Green: 12 distinct states, no errors, completes in <1s.
- Bug: temporal property violated with the textbook counterexample — CommitWUOW × 2, AdvanceDurable, AcquireGlobalS, stutter at (lastWritten=2, durableOpTime=1, globalSLockHeld=TRUE). Matches the ticket's stable equilibrium step-for-step.
- is related to
-
SERVER-126254 fsyncLock leaves durableOpTime stuck behind lastWritten, hanging snapshot/majority reads
-
- In Code Review
-