Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-65083

Thread seeing different set of indices can incorrectly re-use an SBE plan cache entry

    • Fully Compatible
    • ALL
    • v6.0
    • QE 2022-04-04, QE 2022-04-18, QE 2022-05-02
    • 42

      In SERVER-60066 we attempted to build a versioning mechanism for the SBE plan cache. The idea was to encode a monotonically increasing counter in the SBE plan cache key, and bump this counter in the collection catalog whenever an index is dropped or added. This would ensure that all queries seeing the same version of the Collection object would also see the same set of indices, and that any matching SBE plan cache entry would be consistent with this set of indexes.

      However, this implementation did not fully ensure the correct behavior of the SBE plan cache when there are concurrent index builds drops. The problem is that readers may not read at a timestamp prior to the completion of an index build. When this occurs, the index catalog code hides the indexes that were not ready at the reader's timestamp. Such readers always see the latest Collection object. This creates a situation where two readers can share a Collection version yet see different views of the index catalog.

      The consequence is that an SBE plan cache entry can be incorrectly reused and trip a tassert(). Here is a step-by-step description of one possible scenario:

      • A new index is constructed with a minimum visible snapshot timestamp of t2.
      • Around the same time, two read transactions start. Let's name the readers r1 and r2.
      • r1 decides to read at timestamp t1 such that t1 < t2.
      • r2 decides to read at timestamp t3 such that t3 > t2.
      • r2 chooses a plan using the newly constructed index, which it can see because its read timestamp is greater than the minimum visible snapshot of the index. It caches this plan.
      • For whatever reason, r1 is running more slowly than r3 (e.g. maybe the thread was unscheduled by the operating system). It consults the plan cache and finds the cache entry constructed by r2. However, r1 cannot see the index because it is reading at a time where the index does not exist on disk. The catalog code makes sure to hide such indexes here.
      • When r1 tries to recover the plan from the cache, it tries to look up the index by name and asserts that the index exist. This assertion trips because the index is not visible in r1's snapshot.

      We've also seen the bug manifest in a slightly different way:

      • The test kicks off a rebuild of index {b: 1}, which will get committed with a minValidSnapshot of t2.
      • Meanwhile, two readers r1 and r2 start. r1 is reading at a timestamp of t1 < t2 and r2 is reading at a timestamp of t3 > t2.
      • r1 ends up running first, for whatever reason. Because it cannot see the index {b: 1}, there is only one possible query plan.
      • r2 starts running, but this query can see both indexes. It consults the plan cache, finds nothing, and then starts to multiplan.
      • r1 gets far enough along to insert a pinned cache entry. The cache entry is created as pinned because r1 only generates one query plan, and single solution plans are pinned.
      • r2 finishes multi-planning and tries to cache the result. This will override any pre-existing cache entry. At this point, we see a pinned cache entry and trip a tassert().

            Assignee:
            david.storch@mongodb.com David Storch
            Reporter:
            david.storch@mongodb.com David Storch
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: