Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-41214

Query yield recovery does not check for collection rename under the correct locks

    XMLWordPrintable

    Details

    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v4.2
    • Sprint:
      Query 2019-06-17, Query 2019-07-01
    • Linked BF Score:
      59

      Description

      Queries periodically yield and recover their intent locks using Locker::LockSnapshot. After restoring the lock snapshot, PlanExecutor::restoreState() is called in order to make several validity checks. As part of this yield recovery process, we make a check for collection rename and throw an exception if the Collection object's namespace string has changed during yield.

      It is possible that this check is made without holding the correct lock on the Collection object. The resource IDs which we lock are collection names, not collection UUIDs. Therefore, if a collection is renamed, the identity of the lock that must be held to protect the Collection object also changes. Consider the case of a collection being renamed from A to B during yield. When the lock snapshot is restored, the query execution subsystem will continue to hold the lock on A, not B. This means that the call to Collection::ns() may not be correctly synchronized with other threads executing drops or renames on that collection.

      Credit to Geert Bosch for helping me discover this issue and think through a fix. The easiest fix would be to use CollectionCatalog::lookupNSSByUUID() instead of CollectionCatalog::lookupCollectionByUUID() in order to safely identify whether a rename has occurred. If the namespace string has not changed, then you know that the correct locks were restored from the lock snapshot and query execution can proceed. Otherwise, the query will be killed.

      Additional work will be required if we ever want to allow queries to survive renames. In particular, we may want to change query yielding so that it doesn't use the Locker::LockSnapshot interface, but rather uses the db_raii.h helpers directly on yield recovery.

        Attachments

          Activity

            People

            Assignee:
            david.storch David Storch
            Reporter:
            david.storch David Storch
            Participants:
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: