Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-61097

SizeStorer can cause deadlocks with cache eviction

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 5.2.0, 5.0.6, 4.4.15, 4.2.21
    • Affects Version/s: 4.2.0, 4.4.0
    • Component/s: None
    • Labels:
    • Fully Compatible
    • v5.0, v4.4, v4.2
    • Execution Team 2021-11-15, Execution Team 2021-11-29
    • 19

      Note: This ticket does not fully fix the deadlock described. A complete fix was introduced in SERVER-60334.

      This is a follow-up to WT-8245.

      There's a mutex inside the SizeStorer that serializes access to a global WT session and cursor that we keep open forever. We let multiple threads share it, which is where the mutex comes in. In general, it's not a good idea to hold an exclusive lock and call into the storage engine.

      The larger problem is that the SizeStorer uses a WT_SESISON that is not the one owned by the calling operation, which may also have its own WT_SESSION.

      In practice, this has only shown up in importCollection. After the operation has performed a catalog write, it gets stuck inside of SizeStorer::load, holds this mutex, and blocks on cache eviction. WiredTiger will roll back transactions that have written data, but it will not roll back read-only transactions. WiredTiger cannot roll-back the SizeStorer::load() because the SizeStorer uses an entirely separate WT_SESSION than the one that importCollection uses. So even though importCollection has written data, it cannot be rolled back even if it is causing cache issues.

      Using more than one WT_SESSION per thread is a bug that we've seen before.

      We should just get rid of this global session + cursor and require that callers pass their own OperationContext. If that's not possible for some reason, we'll need to use "cache_max_wait_ms" to allow the operation to time itself out.

            gregory.wlodarek@mongodb.com Gregory Wlodarek
            louis.williams@mongodb.com Louis Williams
            0 Vote for this issue
            9 Start watching this issue