Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-54839

Fix data races in PrimaryOnlyService

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 4.9.0
    • Affects Version/s: 5.0 Required
    • Component/s: Internal Code
    • None
    • Fully Compatible
    • ALL
    • Service Arch 2021-03-22
    • 135

      This ticket should fix the data races in PrimaryOnlyService and simplify the interruption/shutdown process. Currently, there are two known data races:

      • The list of operations is accessed in here (also shown below) without any synchronization.
      • We identify running instances by looking at _running, a non-synchronized boolean that is used to guard another non-synchronized variable (i.e., _finishedNotifyFuture). Both values are set by an executor thread (here), and in a non-synchronized manner.
      for (auto opCtx : _opCtxs) {
          stdx::lock_guard<Client> clientLock(*opCtx->getClient());
          _serviceContext->killOperation(clientLock, opCtx, ErrorCodes::InterruptedAtShutdown);
      }
      

      This ticket should also propose a solution to fix the interruption pattern for primary only services. Interrupting the operations, as done here and here is not sufficient, as it's inherently racy and another thread may create a new operation after the shutdown/stepDown thread is passed interrupting/killing the existing operations. One solution could be piggy backing on the interrupt interface (defined here) and throwing the interruption status on any attempt to create a new opCtx (e.g., by throwing from the operation observer in primary only service).

            Assignee:
            matthew.saltz@mongodb.com Matthew Saltz (Inactive)
            Reporter:
            amirsaman.memaripour@mongodb.com Amirsaman Memaripour
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: