Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-27048

Fix recursive lock issue leading to deadlock or crash in LegacySession

    XMLWordPrintable

    Details

    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Completed:
    • Steps To Reproduce:
      Hide

      Race conditions are hard to reproduce.

      Show
      Race conditions are hard to reproduce.
    • Sprint:
      Platforms 2016-11-21
    • Linked BF Score:
      0

      Description

      The `TransportLayerLegacy::endAllSessions` function takes a lock `_sessionsMutex`. This lock is also taken by the `TransportLayerLegacy::_destroy` method, which is called indirectly by the `TransportLayerLegacy::LegacySession::~LegacySession()` destructor.

      Within `endAllSessions`, the `_sessions` list of weak pointers is one-by-one promoted by taking a shared pointer to it, then processed, then the shared pointer is discarded.

      This leads to a pair of difficult to reproduce race conditions on `endAllSessions`:

      1: The failing case in BF-4102, where the shared pointer is created, making the ref-count of its object at least 2. Other threads dispose of their shared pointers to this object, leaving only the shared pointer which was promoted from the weak pointer behind. That shared pointer will go out of scope at the end of the loop iteration processing it, thus invoking the destructor. That destructor will indirectly call `TransportLayerLegacy::_destroy`, which will attempt to take the lock. Recursively taking a lock in C++'s mutex class is undefined behavior. Typical implementations will either deadlock, or throw `std::system_error` (as encouraged by the standard, but this is non-normative behavior.) Thus, the attempt to take the lock in the `_destroy` function will throw an exception, which is the precise observed behavior in BF-4102.

      2. An entry in the weak pointer list has expired, due to the last true pointers to it being destroyed. Promoting the weak pointer will fail, giving a nullptr value for the shared_ptr. In this case, the code skips over empty promoted pointers, thus having no failing actions.

        Attachments

          Activity

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: