-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
Major - P3
-
Affects Version/s: None
-
Component/s: None
-
None
-
Storage Execution
-
ALL
-
Storage Execution 2026-03-16
-
(copied to CRM)
-
None
-
None
-
None
-
None
-
None
-
None
-
None
While a node is primary, a long-running aggregation query can hold on to an old CollectionCatalog snapshot across rollback, keeping the old oplog RecordStore alive even after the node reopens the catalog and create a new oplog RecordStore. The oplog visibility thread is restarted for the new RS during rollback catalog re-initialization,, but when the aggregation finally unwinds and destroys the old RS, its destructor shuts down the visibility thread even though new RS is now the active oplog RS on a primary. There is no code path to restart the visibility thread after this, so the node can remain primary with no oplog visibility thread running.
This bug results in an impact to availability. Nodes that become primary with the visibility thread cannot replicate oplog entries to secondaries (as they cannot read past the visibility point), leading to a loss of the ability to commit majority writes.
Detailed race condition steps:
1. Node is running as a primary.
- WiredTigerOplogManager is running the oplog visibility thread normally. WiredTigerOplogManager owns a single oplog visibility thread (stdx::thread) and manages the lifetime of the thread.
2. An aggregation query begins and checks out the in-memory CollectionCatalog.
- This obtains a shared_ptr on the collection catalog via CollectionCatalog::get(opCtx).
- This snapshot includes the oplog’s WiredTigerRecordStore object (call this RS_old).
- As long as this aggregation is alive and holding the shared_ptr, the CollectionCatalog object and RS_old cannot be destroyed.
3. The node steps down and goes into rollback (catalog re-initialization).
- The node interrupts ongoing operations, including the aggregation. However, the aggregation is not at an interrupt point.
- Rollback closes and reopens the catalog (closeCatalog(), openCatalog()), forcing a refresh of the in-memory catalog state.
- closeCatalog() clears the list of collections, which decrements the shared_ptr refcounts inside the catalog. However, the ongoing aggregation still holds a reference to this CollectionCatalog object, so the oplog’s RS_old is not destroyed.
- openCatalog() reloads the catalog and reinitializes the oplog RecordStore as a new instance (call this RS_new). This reinitialization stops and then starts the oplog visibility thread using RS_new.
4. Rollback completes, the node returns to normal service, and it becomes primary again.
- The long-running aggregation is still active on the node and is still holding the old CollectionCatalog object.
- It has a live reference to RS_old.
5. The aggregation operation eventually hits an interrupt point and fails with an InterruptedDueToReplStateChange error, from the rollback interrupt.
- While unwinding the call stack, it drops the last reference to the old CollectionCatalog snapshot and RS_old.
- The oplog WiredTigerRecordStore destructor is called and it stops the oplog visibility thread, even though the oplog visibility thread is running on RS_new.
6. The oplog visibility thread is now shut down, and there is no way to restart it in this state. The node must restarted to fix the issue.