[SERVER-54839] Fix data races in PrimaryOnlyService Created: 26/Feb/21  Updated: 29/Oct/23  Resolved: 10/Mar/21

Status: Closed
Project: Core Server
Component/s: Internal Code
Affects Version/s: 5.0 Required
Fix Version/s: 4.9.0

Type: Bug Priority: Major - P3
Reporter: Amirsaman Memaripour Assignee: Matthew Saltz (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
is related to SERVER-52791 Make PrimaryOnlyService::shutdown exp... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Service Arch 2021-03-22
Participants:
Linked BF Score: 135

 Description   

This ticket should fix the data races in PrimaryOnlyService and simplify the interruption/shutdown process. Currently, there are two known data races:

  • The list of operations is accessed in here (also shown below) without any synchronization.
  • We identify running instances by looking at _running, a non-synchronized boolean that is used to guard another non-synchronized variable (i.e., _finishedNotifyFuture). Both values are set by an executor thread (here), and in a non-synchronized manner.

for (auto opCtx : _opCtxs) {
    stdx::lock_guard<Client> clientLock(*opCtx->getClient());
    _serviceContext->killOperation(clientLock, opCtx, ErrorCodes::InterruptedAtShutdown);
}

This ticket should also propose a solution to fix the interruption pattern for primary only services. Interrupting the operations, as done here and here is not sufficient, as it's inherently racy and another thread may create a new operation after the shutdown/stepDown thread is passed interrupting/killing the existing operations. One solution could be piggy backing on the interrupt interface (defined here) and throwing the interruption status on any attempt to create a new opCtx (e.g., by throwing from the operation observer in primary only service).



 Comments   
Comment by Githook User [ 10/Mar/21 ]

Author:

{'name': 'Matthew Saltz', 'email': 'matthew.saltz@mongodb.com', 'username': 'saltzm'}

Message: SERVER-54839 Fix PrimaryOnlyService data races
Branch: master
https://github.com/mongodb/mongo/commit/634d4ba3e14c14d34b234af3a40ea1a14f95f2fb

Generated at Thu Feb 08 05:34:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.