Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-51650

Primary-Only Service's _rebuildCV should be notified even if stepdown happens quickly after stepup

    • Service Arch
    • Fully Compatible
    • v6.1
    • Service Arch 2022-05-30, Service Arch 2022-06-13, Service Arch 2022-06-27, Service Arch 2022-07-11, Service Arch 2022-07-25, Service Arch 2023-02-20, Service Arch 2023-03-06, Service Arch 2023-05-01, Service Arch 2023-05-15, Service Arch 2023-05-29, Service Arch 2023-06-12
    • 131
    • 4

      When stepup completes, each Primary-Only Service's _state is set to kRebuilding.

      Both PrimaryOnlyService::lookupInstance and PrimaryOnlyService::getOrCreateInstance wait until the _rebuildCV condition variable is notified and the _state is no longer kRebuilding.

      _rebuildCV is notified in PrimaryOnlyService::_rebuildInstances, which on stepup is scheduled to run asynchronously.

      If stepdown occurs before _rebuildInstances starts, e.g. if stepdown occurs here, then _rebuildCV may never be notified. So, any threads blocking in lookupInstance or getOrCreateInstance that don't get interrupted by stepdown will block indefinitely.

      Currently, there is an invariant in lookupInstance that the thread is guaranteed to be interrupted by stepdown. Otherwise, if the thread is holding the RSTL lock, the thread would prevent the stepdown from completing, leading to a deadlock.

      It would be better to notify _rebuildCV here to guarantee threads cannot block indefinitely in lookup or getOrCreateInstance.

       

      Acceptance criteria: 

      Reproduce issue in unit test
      Fix as suggested 

            Assignee:
            wenbin.zhu@mongodb.com Wenbin Zhu
            Reporter:
            esha.maharishi@mongodb.com Esha Maharishi (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

              Created:
              Updated:
              Resolved: