Workgen: Fix _ts_mutex concurrency bug blocking the timestamp thread

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Fixed
    • Priority: Major - P3
    • WT12.0.0
    • Affects Version/s: None
    • Component/s: Test wtperf
    • None
    • Storage Engines, Storage Engines - Transactions
    • SE Transactions - 2026-03-13
    • 3

      Summary

      While investigating cache-stuck behavior for disaggregated storage workloads WT-16817, I found that the apparent “stuck timestamps” were caused by a locking/concurrency bug in Workgen, not WiredTiger itself. Workgen can hold _ts_mutex while calling session->prepare_transaction / session->commit_transaction. If those calls block, the timestamp thread is prevented from advancing stable/oldest timestamps.

      Problem

      • Workgen uses a shared mutex _ts_mutex to protect its timestamp state.
      • In the transaction path, Workgen calls:
        • session->prepare_transaction
        • session->commit_transaction while holding _ts_mutex.
      • Under load (e.g., cache pressure), these calls can block. If they do, the worker thread remains blocked while still holding _ts_mutex.
      • The timestamp thread also needs _ts_mutex to compute and publish new stable/oldest timestamps.
      • Once a worker is stuck in prepare/commit with _ts_mutex held, the timestamp thread cannot acquire _ts_mutex, and stable/oldest timestamps stop moving.
      • This manifests as an apparent WiredTiger “cache-stuck / timestamp-stuck” scenario, but is actually an artifact of the Workgen.

      Root Cause

      • _ts_mutex critical sections were too broad:
        • They wrapped both timestamp computation (WorkgenTimeStamp::get_timestamp[_lag]) and potentially blocking WT API calls.
      • This creates a lock-holding/blocking pattern where a stalled WT call can starve the timestamp thread.

      Proposed Fix

      • Narrow the scope of _ts_mutex to only cover timestamp computation:
        • Hold _ts_mutex just long enough to call WorkgenTimeStamp::get_timestamp[_lag](...).
        • Release _ts_mutex before calling any WiredTiger APIs:
          • conn->set_timestamp
          • session->prepare_transaction
          • session->commit_transaction
      • Commit path changes:
        • For use_prepare_timestamp:
          • Take _ts_mutex, compute prepare_ts, release _ts_mutex.
          • Call prepare_transaction with prepare_timestamp=... (no _ts_mutex held).
          • Call commit_transaction with commit_timestamp=..., durable_timestamp=... (no _ts_mutex held).
        • For use_commit_timestamp:
          • Take _ts_mutex, compute commit_ts, release _ts_mutex.
          • Call commit_transaction with commit_timestamp=... (no _ts_mutex held).

            Assignee:
            Ravi Giri
            Reporter:
            Ravi Giri
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: