-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Storage Execution
-
ALL
-
-
Storage Execution 2026-05-11, Storage Execution 2026-05-25
-
None
-
None
-
None
-
None
-
None
-
None
-
None
If an oplog entry is in the "applied but not yet durable" window on a primary at the moment fsyncLock acquires Global S, the in-memory durableOpTime stays stuck behind lastWritten for the entire lifetime of the lock. lastCommittedOpTime cannot advance past the stuck timestamp, so any snapshot or majority read with afterClusterTime past the in-flight entry hangs indefinitely until fsyncUnlock is called. The bug applies to any primary, any secondary count, any storage workload.
Symptoms
When triggered the producer of the in-flight entry is whichever asynchronous primary-side oplog writer happens to commit a WUOW in the few-hundred-microsecond gap between commit and the next periodic journal fsync. Externally visible effects on a hung primary:
- $currentOp on system threads:
JournalFlusher waitingForLock: true locks: {Global: "w"} OplogCapMaintainerThread waitingForLock: true locks: {Global: "w"} ChangeStreamExpiredPreImagesRemover waitingForLock: true locks: {Global: "w"} fsyncLockWorker waitingForLock: false locks: {Global: "R"}
- appliedOpTime == writtenOpTime ahead of {{durableOpTime ==
lastCommittedOpTime}}. - User snapshot or majority reads with afterClusterTime past durableOpTime
hang for the duration of the lock. - Operations attempting any IX on Global block (writes, j:true/majority
writes, NoopWriter bails out via its 1 ms timeout).
The hang is a stable equilibrium: nothing on the primary can advance durableOpTime while Global S is held, so the state persists until fsyncUnlock is called.
- is depended on by
-
SERVER-123573 Investigate the hang of fsync.js with replicated size and count
-
- Blocked
-
- related to
-
SERVER-123573 Investigate the hang of fsync.js with replicated size and count
-
- Blocked
-
-
SERVER-126548 TLA+ spec + regression jstest for fsyncLock leaving durableOpTime stuck behind lastWritten (SERVER-126254)
-
- Needs Verification
-