Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-72707

Change how classic query engine RequiresIndexStage looks up IndexCatalogEntry after yield

    • Type: Icon: Task Task
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 6.3.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • Storage Execution
    • Fully Compatible
    • Execution Team 2023-02-06

      For reference, here's the ownership hierarchy for the collection catalog:

      CollectionCatalog
          -owns-> shared_ptr<Collection> 
              -owns-> IndexCatalog 
                  -owns-> shared_ptr<IndexCatalogEntry>
                      -owns->IndexAccessMethod
      

      This object hierarchy/ownership hierarchy might also help as a reference for the following description.

      IndexAccessMethod 
          -is parent of-> SortedDataIndexAccessMethod
              -owns-> SortedDataInterface
                  -is parent of (for example)-> WiredTigerIndex
      

      RequiresIndexStage is a query plan stage responsible for tracking an index used by a query and checking whether, after restoring from yield, that index has been dropped.

      Currently, it does this by, on construction, saving a weak_ptr to the IndexCatalogEntry for that index, and then after restoring from a yield, locking the weak_ptr to check if the IndexCatalogEntry has been destroyed or not. If locking the weak_ptr returns nullptr, it means this index has been dropped, and a QueryPlanKilled error is thrown.

      This has worked historically because all IndexCatalogEntry's were tracked in the catalog, and an IndexCatalogEntry would only be destroyed if the index were dropped (or the collection were dropped, which would make the index be dropped).

      For the point-in-time catalog project, however, when reading from an old version of the catalog, we sometimes create an entire new Collection object from scratch based on the durable catalog, including the IndexCatalogEntry s for any indexes at that point in time. This Collection object, however, is destroyed every time we yield, and recreated every time we restore from yield.

      So if we start a query with one of these Collection objects, after restoring from yield, the original weak_ptr<IndexCatalogEntry> will ALWAYS point to a shared_ptr<IndexCatalogEntry> that has already been destroyed (since the Collection object owning it will have been destroyed when yielding). This means that any queries using such an object will throw QueryPlanKilled after yielding.

      To fix this, we need to instead have another way for RequiresIndexStage to determine after restoring a yield whether an index has been dropped that doesn't rely on the lifetime of the specific IndexCatalogEntry object. This in turn requires us to make sure that anywhere else in the query plan that uses data from the IndexCatalogEntry object has that data properly restored after yield instead of holding onto invalid references. For instance IndexScan has a SortedDataInterface::Cursor that's initialized with the IndexAccessMethod from the specific IndexCatalogEntry used on construction of RequiresIndexStage, and does not refresh its view of the index access method when restoring from yield, which means that this reference to the IndexAccessMethod 's SortedDataInterface will become invalid when that IndexCatalogEntry is destroyed. So any solution to the problem specified in this ticket must deal with any such invalid references. (In the case of WiredTigerIndexCursorBase, this could be satisfied by making all public members that require use of the IndexAccessMethod to take the IndexAccessMethod as a parameter, rather than having the WiredTigerIndexCursorBase store a non-owned reference to the IndexAccessMethod 's SortedDataInterface.)

      The solution we've come up with is:

      • RequiresIndexStage will hold a std::string _idxIdent for the index it corresponds to, that it looks up and stores upon construction
      • When restoring from yield, it will call something like collection()->findIndexCatalogEntryByIndent(_idxIdent) to find the IndexCatalogEntry. If the collection does not have an index with this ident, Collection::findIndexCatalogEntryByIdent will return nullptr (or boost::none or whatever), which will cause RequiresIndexStage::doRestoreState… to return QueryPlanKilled
        • This will require us to add a new function Collection::findIndexCatalogEntryByIdent
      • When any class that inherits from RequiresIndexStage (e.g. IndexScan) restores from yield, it will need to make sure that it is not holding onto any data that was owned by the previous Collection / IndexCatalogEntry object. So for example, in IndexScan, we might want to change the methods of SortedDataInterface::Cursor to take in an IndexAccessMethod / SortedDataInterface, so that IndexScan::doWork can always pass in the latest IndexAccessMethod obtained by RequiresIndexStage::indexAccessMethod() as of the most recent restore-from-yield. E.g. you might do something like _indexCursor->seek(indexAccessMethod()); . And then WiredTigerIndexCursorStandard will no longer need to hold onto a const WiredTigerIndex& , which could be invalidated.
        • Or:
          • Get the new IndexAccessMethod from RequiresIndexStage and pass that to any cursors it has for their “restore” methods.
            I.e. _indexCursor->restore(indexAccessMethod())

      This approach assumes that the ident string will be unique, so that if a collection is dropped and recreated with the same name within the lifetime of the process, the ident string for that collection will still be different. The way that we choose ident strings I believe satisfies this constraint.

            Assignee:
            gregory.wlodarek@mongodb.com Gregory Wlodarek
            Reporter:
            matthew.saltz@mongodb.com Matthew Saltz (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: