Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-81493

Handle StorageUnavailableException when resetting WiredTiger cursors

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 7.2.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • None
    • Storage Execution NAMER
    • Fully Compatible
    • ALL
    • v7.0
    • Execution NAMR Team 2023-10-16
    • 120

      A call to WT_CURSOR::reset can roll back due to cache pressure. There's a long-standing assumption (since at least 3.6) that ignoring these rollbacks is safe because the transaction is getting killed anyway. This assumption is incorrect: query plans reset the cursor before performing the write (e.g. the update stage). When the exception is swallowed, the write proceeds and eventually fails to commit due to the transaction requiring rollback.

      I've linked a build failure where a replica set reconfig raced with a test designed to generate a transaction too large to fit in cache. WiredTiger rolled back the oldest transaction to ease the cache pressure, the oldest transaction happened to be the reconfig thread persisting the new configuration, and not handling that exception eventually failed an invariant when trying to commit the transaction.

      We should not swallow StorageUnavailableExceptions in WiredTigerRecordStoreCursorBase::save() and WiredTigerRecordStore::RandomCursor::save() and handle the exception accordingly up in the call chain.

      We should also investigate if callers of WiredTigerIndexCursorGeneric::resetCursor() and PlanYieldPolicy::yieldOrInterrupt() are similarly impacted.

            Assignee:
            louis.williams@mongodb.com Louis Williams
            Reporter:
            josef.ahmad@mongodb.com Josef Ahmad
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: