Details
-
Bug
-
Resolution: Fixed
-
Major - P3
-
None
-
None
-
Fully Compatible
-
ALL
-
Repl 2019-10-07, Repl 2019-10-21
-
13
Description
A hang was observed in the DeleteOpIsIdBased unittest in repltests.cpp. The test performs several deletes (which create delete oplog entries) and immediately queries the oplog, triggering a call to waitForAllEarlierOplogWritesToBeVisible. The stack trace is approximately:
Thread 1: "testsuite" (Thread 0x7fdf3e0f7ac0 (LWP 67799))
|
.
|
#10 0x000055f6e9f94225 in mongo::Interruptible::waitForConditionOrInterrupt
|
#11 mongo::WiredTigerOplogManager::waitForAllEarlierOplogWritesToBeVisible
|
.
|
#16 0x000055f6eb3e54fb in mongo::(anonymous namespace)::FindCmd::Invocation::run
|
.
|
#28 0x000055f6e97c822d in ReplTests::Base::applyAllOperations
|
#29 0x000055f6e9824b59 in ReplTests::DeleteOpIsIdBased::run
|
This hang was observed approximately once in Evergreen. It seems likely to be a race involving the WTOplogJournalThread and the main thread, where the main thread is expecting the WTOplogJournalThread to call _setOplogReadTimestamp but it already has / never does. As lingzhi.deng showed me, it may be because waitForAllEarlierOplogWritesToBeVisible increments _opsWaitingForVisibility tell this thread that someone is waiting for it, but the thread checks a different member, _opsWaitingForJournal, to determine if there are any waiters.
Attachments
Issue Links
- related to
-
SERVER-44196 Complete TODO listed in SERVER-43399
-
- Closed
-