Picking up new checkpoints or step-up and step-down can stall eviction

XMLWordPrintableJSON

    • Storage Engines, Storage Engines - Transactions
    • SE Transactions - 2025-11-07
    • 3

      We currently use the reconfigure API to manage operations such as picking up new checkpoints, stepping up, and stepping down. These operations are resource-intensive, as they involve reading and writing metadata. Additionally, our current implementation disables the eviction server while reconfiguration is in progress.

          while (slot < max_entries && loop_count++ < conn->dhandle_count) {
              /* We're done if shutting down or reconfiguring. */
              if (F_ISSET_ATOMIC_32(conn, WT_CONN_CLOSING | WT_CONN_RECONFIGURING))
                  break;
      

      This behaviour may stall both the eviction server and the reconfigure thread. If the reconfigure thread needs to perform eviction, it can become stuck since the eviction queue is empty, potentially resulting in a deadlock.

      Temporarily allow eviction server to continue working for disagg when reconfigure is set.

            Assignee:
            Chenhao Qu
            Reporter:
            Chenhao Qu
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: