WASM JS engine surfaces Interrupted error code instead of the real kill reason when a wasmtime epoch interrupt fires

XMLWordPrintableJSON

    • Query Integration
    • ALL
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      When a MongoDB operation using a $where predicate is interrupted (e.g. by maxTimeMS expiration), the WASM JS engine returns ErrorCodes::Interrupted (11601) instead of the actual interrupt reason such as MaxTimeMSExpired (50).

      Root Cause

      When maxTimeMS expires, the following sequence occurs:

      1. The OperationContext is marked killed with MaxTimeMSExpired (code 50)
      2. WasmtimeScriptEngine::interrupt() calls scope->kill(), which increments the wasmtime epoch to signal the WASM engine
      3. The WASM engine fires a wasm trap: interrupt
      4. MozJSWasmBridge::_callFunc in src/mongo/scripting/mozjs/wasm/bridge/bridge.h catches the trap and, seeing isKillPending() == true, unconditionally throws ErrorCodes::Interrupted with the raw wasmtime backtrace message

      At step 4, the bridge has no reference to the OperationContext and cannot consult the actual kill reason. It always emits Interrupted regardless of whether the kill was caused by maxTimeMS, an explicit killOp, or any other reason.

      Impact

      Any code that distinguishes between interrupt reasons — such as tests or retry logic that checks for MaxTimeMSExpired vs. Interrupted — will behave incorrectly when $where is involved. The test jstests/noPassthrough/interruption/max_time_ms_does_not_leak_shard_cursor.js demonstrates this: it expects MaxTimeMSExpired (50) or NetworkInterfaceExceededTimeLimit (202) but receives Interrupted (11601).

      Proposed Fix

      WasmtimeImplScope (in src/mongo/scripting/mozjs/wasm/scope/scope.cpp) already holds a pointer to the OperationContext via _opCtx. In WasmtimeImplScope::invoke(), wrap each bridge call (invokePredicate, invokeFunction, invokeMap) in a try-catch block. On exception, if _bridge->isKillPending() and _opCtx is set, call _opCtx->checkForInterrupt(), which will re-throw with the correct error code (e.g. MaxTimeMSExpired). If the opCtx was not the source of the kill (e.g. a DeadlineMonitor-only timeout), checkForInterrupt() will not throw and the original exception propagates unchanged.

      As a secondary fix, the same try-catch blocks also ensure _deadlineMonitor.stopDeadline() is called on exception paths, where it is currently skipped.

      Steps to Reproduce

      Run jstests/noPassthrough/interruption/max_time_ms_does_not_leak_shard_cursor.js on a build with the WASM JS engine enabled. The first test block (the $where + maxTimeMS case) will fail with:

      Error: [11601] and [[ 50, 202 ]] are not equal
      Affected Components

      • src/mongo/scripting/mozjs/wasm/bridge/bridge.h — _callFunc throws Interrupted unconditionally
      • src/mongo/scripting/mozjs/wasm/scope/scope.cpp — invoke() does not re-check opCtx kill status after a WASM interrupt

            Assignee:
            Unassigned
            Reporter:
            Calvin Nguyen
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: