Prevent the TTL Monitor from spawning an unbounded amount of threads for sharding metadata recovery

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Fixed
    • Priority: Major - P3
    • 8.3.0-rc0
    • Affects Version/s: 6.0.0, 7.0.0, 8.0.0, 8.3.0-rc0, 8.2.0
    • Component/s: Catalog, TTL
    • None
    • Catalog and Routing
    • Fully Compatible
    • v8.2, v8.0, v7.0
    • CAR Team 2026-02-16
    • 2
    • 🟦 Shard Catalog
    • None
    • None
    • None
    • None
    • None
    • None

      When the TTL Monitor encounters a StaleConfig error indicating that the sharding metadata for a collection needs to be recovered, it spawns an async thread to execute that recovery and then moves on to the next collection. On clusters with many collections with TTL indexes, this can spawn a large number of threads, particularly during startup, where the sharding metadata is unknown for all collections. This can cause resource exhaustion due to the number of threads/memory, an also thundering heard effects on the configsvr handling the metadata refreshes. We should limit the amount of threads that the TTL Monitor can start for sharding metadata recovery.

            Assignee:
            Jordi Serra Torrens
            Reporter:
            Jordi Serra Torrens
            Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

              Created:
              Updated:
              Resolved: