-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: 5.0 Required
-
Component/s: Internal Code
-
None
-
Fully Compatible
-
ALL
-
Service Arch 2021-03-22
-
135
This ticket should fix the data races in PrimaryOnlyService and simplify the interruption/shutdown process. Currently, there are two known data races:
- The list of operations is accessed in here (also shown below) without any synchronization.
- We identify running instances by looking at _running, a non-synchronized boolean that is used to guard another non-synchronized variable (i.e., _finishedNotifyFuture). Both values are set by an executor thread (here), and in a non-synchronized manner.
for (auto opCtx : _opCtxs) { stdx::lock_guard<Client> clientLock(*opCtx->getClient()); _serviceContext->killOperation(clientLock, opCtx, ErrorCodes::InterruptedAtShutdown); }
This ticket should also propose a solution to fix the interruption pattern for primary only services. Interrupting the operations, as done here and here is not sufficient, as it's inherently racy and another thread may create a new operation after the shutdown/stepDown thread is passed interrupting/killing the existing operations. One solution could be piggy backing on the interrupt interface (defined here) and throwing the interruption status on any attempt to create a new opCtx (e.g., by throwing from the operation observer in primary only service).
- is related to
-
SERVER-52791 Make PrimaryOnlyService::shutdown explicitly interrupt tracked OperationContexts
- Closed