Starting up and shutting down the replicated fast count thread in rapid succession can cause hang

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Fixed
    • Priority: Major - P3
    • 9.0.0-rc0, 8.3.2
    • Affects Version/s: None
    • Component/s: None
    • None
    • Storage Execution
    • Fully Compatible
    • ALL
    • v8.3
    • Storage Execution 2026-03-30
    • 0
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      We set the atomic _isEnabled to false here when shutting down the fast count thread.

      The fast count thread waits on a condition variable here that checks the value of _flushRequested and _isEnabled.

      Shutdown sets _isEnabled to false without holding the metadata mutex. If the thread is started up and shutdown quickly, we can have a lost wakeup where the condition variable does not catch that isEnabled is set to false before entering the wait loop, and unless it spuriously wakes up and checks this value which is not guaranteed to happen, the thread will continue waiting indefinitely and shutdown will wait for the thread to be joined.

            Assignee:
            Damian Wasilewicz
            Reporter:
            Damian Wasilewicz
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: