Revisit the API usage for reconfigure in disagg

    • Type: Task
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Storage Engines
    • None
    • None

      The reconfigure API is designed to handle infrequent configuration changes for the database with minimal disruption to normal operations. When invoked, it temporarily pauses the eviction server to ensure smooth execution.

      However, in the disaggregated storage architecture (disagg), we use the API more frequently to adopt new checkpoints on the standby node. This has led to issues such as deadlocks (HELP-84317), as adopting a new checkpoint involves reading and writing the metadata and the shared metadata. When eviction is paused during this process, it can result in operational bottlenecks.

      To address this issue temporarily, I created ticket WT-15963, which allows the eviction server to continue running during reconfigure operations. However, this is not an ideal solution and introduces risks, particularly if the reconfigure API is used to modify the eviction server itself.

      Moving forward, we need to evaluate whether the reconfigure API is appropriate for frequent and resource-intensive tasks like checkpoint adoption. A more robust solution may be required to ensure stability and prevent similar issues.

      cc: alexander.gorrod@mongodb.com keith.smith@mongodb.com peter.macko@mongodb.com as it involves API changes.

            Assignee:
            [DO NOT USE] Backlog - Storage Engines Team
            Reporter:
            Chenhao Qu
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: