Fix TSan data races: non-atomic reads of wt_shared fields modified atomically in layered/version-cursor paths

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Fixed
    • Priority: Major - P3
    • WT12.0.0
    • Affects Version/s: None
    • Component/s: Concurrency
    • None
    • Storage Engines - Foundations
    • 13.012
    • None
    • None

      Problem

      ThreadSanitizer reported three data races in the switch-mode / layered-storage TSan variant of the format test. In each case a wt_shared field that is written with an atomic primitive is read with a plain (non-atomic) load, which TSan treats as a data race.

      Race 1 – btree->original in __wt_btree_disable_bulk

      btree->original is a wt_shared uint8_t. The early-exit guard in btree_inline.h used a plain read while another thread concurrently performed __wt_atomic_cas_uint8 on the same byte (the one-time CAS that disables bulk-load mode). Two layered insert threads racing on the same freshly-opened B-Tree trigger this.

      Race 2 – conn->version_cursor_count read in __wt_txn_pinned_timestamp

      version_cursor_count is a wt_shared uint32_t incremented atomically in _wt_curversion_open (wt_atomic_add_uint32) and decremented atomically in curversion_close (_wt_atomic_sub_uint32). The fast-path check in txn_inline.h used a plain load while a drain-worker thread performed an atomic add/sub concurrently.

      Race 3 – conn->version_cursor_count read in __wt_curversion_open

      Same field as Race 2. The check inside the txn-global write-lock section of _wt_curversion_open used a plain load, but _curversion_close decrements the counter atomically without acquiring that lock, so the lock does not prevent the race.

      Fix

      Replace each plain read with _wt_atomic_load*_relaxed:

      File Old New
      src/include/btree_inline.h Unable to render embedded object: File (btree->original}} ) not found.__wt_atomic_load_uint8_relaxed(&btree->original)
      src/include/txn_inline.h S2C(session)->version_cursor_count > 0 __wt_atomic_load_uint32_relaxed(&S2C(session)->version_cursor_count) > 0
      src/cursor/cur_version.c conn->version_cursor_count == 0 __wt_atomic_load_uint32_relaxed(&conn->version_cursor_count) == 0

      Relaxed ordering is sufficient in all three cases: the checks are advisory fast-paths or one-shot transition guards, and the subsequent CAS / rwlock operations provide the necessary acquire/release ordering for dependent side-effects.

            Assignee:
            Sid Mahajan
            Reporter:
            Sid Mahajan
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: