Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-51650

Primary-Only Service's _rebuildCV should be notified even if stepdown happens quickly after stepup

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Major - P3
    • Resolution: Fixed
    • None
    • 6.1.0-rc0
    • None
    • Fully Compatible
    • Service Arch 2022-05-30, Service Arch 2022-06-13, Service Arch 2022-06-27, Service Arch 2022-07-11, Service Arch 2022-07-25
    • 131
    • 4

    Description

      When stepup completes, each Primary-Only Service's _state is set to kRebuilding.

      Both PrimaryOnlyService::lookupInstance and PrimaryOnlyService::getOrCreateInstance wait until the _rebuildCV condition variable is notified and the _state is no longer kRebuilding.

      _rebuildCV is notified in PrimaryOnlyService::_rebuildInstances, which on stepup is scheduled to run asynchronously.

      If stepdown occurs before _rebuildInstances starts, e.g. if stepdown occurs here, then _rebuildCV may never be notified. So, any threads blocking in lookupInstance or getOrCreateInstance that don't get interrupted by stepdown will block indefinitely.

      Currently, there is an invariant in lookupInstance that the thread is guaranteed to be interrupted by stepdown. Otherwise, if the thread is holding the RSTL lock, the thread would prevent the stepdown from completing, leading to a deadlock.

      It would be better to notify _rebuildCV here to guarantee threads cannot block indefinitely in lookup or getOrCreateInstance.

       

      Acceptance criteria: 

      Reproduce issue in unit test
      Fix as suggested 

      Attachments

        Issue Links

          Activity

            People

              george.wangensteen@mongodb.com George Wangensteen
              esha.maharishi@mongodb.com Esha Maharishi
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: