Investigate changes in SPM-4262: Mechanism to stop non-critical maintenance operations when overloaded

XMLWordPrintableJSON

    • Type: Investigation
    • Resolution: Won't Do
    • Priority: Major - P3
    • No version
    • Affects Version/s: None
    • Component/s: None
    • None
    • Not Needed
    • Developer Tools

      Original Downstream Change Summary

      Introduces a new command to configure background tasks and adds a new section to serverStatus to report their status.

      Description of Linked Ticket

      Epic Summary

      Summary

      Introduce mechanisms that provide the capability to stop non-critical background maintenance operations, which can then be utilized by separate system policies to manage overload conditions.

      Motivation

      During periods of overload, many operations may compete for resources simultaneously. Currently, all operations have the same priority when acquiring and using these resources. However, it is known that several background tasks—such as migrations, range deletions, index builds, and the TTL deleter—are not critical and can be safely paused without posing an immediate risk to cluster health.

      As a first step, this project aims to make it possible to halt these background processes by providing granular and customizable mechanisms to do so.

      Additionally, these mechanisms will serve as foundational tools for policy-driven overload protection strategies, ultimately ensuring the system prioritizes high-value work during periods of contention. This approach will reduce the number of operations that need to be processed, helping the cluster return to a healthy state.

      Documentation

      Scope
      Technical Design
      Docs Update

              Assignee:
              Unassigned
              Reporter:
              Backlog - Core Eng Program Management Team
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: