[SERVER-62884] Simplify synchronization semantics of `PrimaryOnlyService` Created: 21/Jan/22  Updated: 06/Dec/22

Status: Open
Project: Core Server
Component/s: Internal Code
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Amirsaman Memaripour Assignee: Backlog - Service Architecture
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-62682 PrimaryOnlyService Does Not Call _reb... Closed
Assigned Teams:
Service Arch
Participants:

 Description   

The main intention here is to simplify the waiter/notifier pattern around PrimaryOnlyService::_rebuildInstances (defined here). In particular, callers to PrimaryOnlyService::getOrCreateInstance, PrimaryOnlyService::lookupInstance, and PrimaryOnlyService::getAllInstances block on _rebuildCV using the following:

opCtx->waitForConditionOrInterrupt(_rebuildCV, lk, [this]() { return _state != State::kRebuilding; });

However, PrimaryOnlyService::_rebuildInstances may call notify_all on this condition variable (i.e., _rebuildCV) even if there's a change in term (example):

...
stdx::lock_guard lk(_mutex);
if (_state != State::kRebuilding || _term != term) {
    _rebuildCV.notify_all();
    return;
}
...

We should simplify/clarify this code and the logic around notifying threads that await completion of PrimaryOnlyService::_rebuildInstances.

Acceptance criteria: clarify when a thread would block on _rebuildCV, what are the events that would stop this wait, and what's the expected behavior for each observed event. Then, modify the code to align with the findings.


Generated at Thu Feb 08 05:56:19 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.