Redesign checkpoint eviction threshold configuration API to avoid reconfig lock timeouts

    • Type: Task
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Storage Engines - Transactions
    • None
    • 5

      This follows up on WT-16613, where we saw holding a reconfig lock for `_checkpoint_update_evict_triggers_start` and `_checkpoint_update_evict_triggers_end` led to timeout errors in WiredTiger. (See patch here).

      We also cannot use compare-and-swap for doubles.

      We need to redesign the configuration and access API for eviction/dirty thresholds so that we avoid holding the global reconfig lock on hot paths that read thresholds, and use a representation that is safe to update atomically, avoiding CAS on doubles.

      Possible options: 

      • Store thresholds as integers representing fixed-point values, e.g. int32_t threshold = 125; // 12.5% and read the integer atomically and convert it to a double.
      • Reduce the scope of the global reconfig lock around threshold updates, or introduce a separate lock for threshold fields only.
      • Represent thresholds as part of a config snapshot struct, atomically swap the pointer on reconfig

            Assignee:
            [DO NOT USE] Backlog - Storage Engines Team
            Reporter:
            Alana Huang
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: