-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Replication
-
None
-
Fully Compatible
-
ALL
-
Repl 2019-10-07, Repl 2019-10-21
-
13
A hang was observed in the DeleteOpIsIdBased unittest in repltests.cpp. The test performs several deletes (which create delete oplog entries) and immediately queries the oplog, triggering a call to waitForAllEarlierOplogWritesToBeVisible. The stack trace is approximately:
Thread 1: "testsuite" (Thread 0x7fdf3e0f7ac0 (LWP 67799)) . #10 0x000055f6e9f94225 in mongo::Interruptible::waitForConditionOrInterrupt #11 mongo::WiredTigerOplogManager::waitForAllEarlierOplogWritesToBeVisible . #16 0x000055f6eb3e54fb in mongo::(anonymous namespace)::FindCmd::Invocation::run . #28 0x000055f6e97c822d in ReplTests::Base::applyAllOperations #29 0x000055f6e9824b59 in ReplTests::DeleteOpIsIdBased::run
This hang was observed approximately once in Evergreen. It seems likely to be a race involving the WTOplogJournalThread and the main thread, where the main thread is expecting the WTOplogJournalThread to call _setOplogReadTimestamp but it already has / never does. As lingzhi.deng showed me, it may be because waitForAllEarlierOplogWritesToBeVisible increments _opsWaitingForVisibility tell this thread that someone is waiting for it, but the thread checks a different member, _opsWaitingForJournal, to determine if there are any waiters.
- related to
-
SERVER-44196 Complete TODO listed in SERVER-43399
- Closed