Transaction state dump still overflows fixed buffer and panics eviction thread (ERANGE) after WT-16954

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Logging
    • None
    • Storage Engines - Transactions
    • 129.814
    • SE Transactions - 2026-06-19
    • 1

      Symptom

      In a disaggregated Atlas cluster (TSBS load), the eviction-server thread panicked with error 34 (ERANGE, "Numerical result out of range") while running the cache-stuck per-session transaction-state dump. The ERANGE propagates to WT_RET_PANIC in the eviction thread run loop, turning a recoverable "cache stuck" diagnostic dump into a fatal WT_PANIC + fassert (mongod crash + restart). Originally surfaced via AF-17533 (MongoDB 9.0.0-rc1010).

      Root cause

      The per-transaction detail line in __wt_verbose_dump_txn_one() (src/txn/txn.c) is formatted into a buffer sized:

      buf_len = (uint32_t)snapshot_buf->size + 512;
      if (txn_err_info->err_msg != NULL)
          buf_len += strlen(txn_err_info->err_msg);
      WT_ERR(__wt_scr_alloc(session, buf_len, &buf));
      WT_ERR(__wt_snprintf((char *)buf->data, buf_len, "transaction id: ...", ...));
      

      WT-16954 made the snapshot list and the last-saved error message dynamic, but left every other field under the fixed 512-byte slack. Measured against the verbatim format string:

      • Literal field labels (all % specifiers removed): 390 bytes (always present).
      • Variable fields excluding snapshot + err_msg, worst case: 338 bytes (six timestamps up to 25 bytes each via WT_TS_INT_STRING_SIZE, several uint64 IDs up to 20 digits, a 32-byte LSN string WT_MAX_LSN_STRING, the 21-byte WT_ISO_READ_COMMITTED tag, two error codes).
      • Total non-snapshot worst case = 390 + 338 = 728 bytes > 512 (over by 216).

      The decisive detail: a reader session with all timestamps (0,0) and tiny IDs already consumes ~481/512, leaving only ~31 bytes of headroom. The active writer session (oldest pinned txn, populated commit/durable/read timestamps, a real checkpoint LSN) trivially exceeds that margin, so __wt_snprintf returns ERANGE.

      Why disaggregated triggers it

      The format string is unchanged, so it is not a single longer field. Disaggregated keeps a checkpoint effectively always running (real ckpt_lsn) and a timestamped long-running transaction, so the active session's line is fully populated at once and blows the ~31-byte margin deterministically. Classic clusters rarely catch ckpt_lsn and timestamps populated together, so the margin usually survived.

      Suggested fix

      Make the detail line overflow-proof rather than tuning the constant. Preferred: build the line with a growable scratch buffer using _wt_buf_catfmt (the same pattern already used for snapshot_buf), eliminating the fixed-size _wt_snprintf entirely. Minimal alternative: size buf_len from the true worst-case label + field budget instead of 512.

            Assignee:
            [DO NOT USE] Backlog - Storage Engines Team
            Reporter:
            Shoufu Du
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: