[SERVER-48279] Race in WiredTigerRecordStore::OplogStones::awaitHasExcessStonesOrDead Created: 18/May/20  Updated: 29/Oct/23  Resolved: 05/Jun/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 4.4.0-rc10, 4.7.0

Type: Bug Priority: Major - P3
Reporter: Henrik Edin Assignee: Gregory Wlodarek
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4
Sprint: Execution Team 2020-06-15
Participants:
Linked BF Score: 14

 Description   

WiredTigerRecordStore::OplogStones uses two mutexes for synchronization. _oplogReclaimMutex (outer) and _mutex (inner).

Code that notifies the OplogCapMaintainerThread just locks the inner mutex and signal the conditional variable to be woken up. Example here: https://github.com/mongodb/mongo/blob/20de257ec7f9f1def474e7a62375df364ae85f4b/src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp#L199-L220

But there is a window here https://github.com/mongodb/mongo/blob/20de257ec7f9f1def474e7a62375df364ae85f4b/src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp#L258-L259 where the OplogCapMaintainerThread has not yet started to wait on the condition variable and the notify call will do nothing. The thread will then wait forever until something else happens that will issue a _pokeReclaimThreadIfNeeded() call. But tests that don't do anything else will eventually timeout.

To fix I propose that we always take both mutexes (in the same order) to eliminate this window. The outer _oplogReclaimMutex should not be contended so this should be safe to do.

An alternative solution would be to just wait for a set amount of time and check if any work needs to be done. But that would require taking the inner _mutex unnecessarily.



 Comments   
Comment by Githook User [ 12/Jun/20 ]

Author:

{'name': 'Gregory Wlodarek', 'email': 'gregory.wlodarek@mongodb.com', 'username': 'GWlodarek'}

Message: SERVER-48279 Eliminate race in WiredTigerRecordStore::OplogStones::awaitHasExcessStonesOrDead

(cherry picked from commit bf8cb71787d18a907173dd67a8ff9950e56a4199)
Branch: v4.4
https://github.com/mongodb/mongo/commit/6f7d770b36a493a724cde0a2d2a7efae62b30963

Comment by Githook User [ 05/Jun/20 ]

Author:

{'name': 'Gregory Wlodarek', 'email': 'gregory.wlodarek@mongodb.com', 'username': 'GWlodarek'}

Message: SERVER-48279 Eliminate race in WiredTigerRecordStore::OplogStones::awaitHasExcessStonesOrDead
Branch: master
https://github.com/mongodb/mongo/commit/bf8cb71787d18a907173dd67a8ff9950e56a4199

Generated at Thu Feb 08 05:16:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.