Decouple balancing of config.system.sessions from sh.stopBalancer()

XMLWordPrintableJSON

    • Cluster Scalability
    • Cluster Scalability Priorities
    • 3
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      The ask in this ticket is to always balance config.system.sessions even when balancer is stopped.

      Customer problem:
      When the balancer is stopped, the config.system.sessions collection is not balanced, which can lead to significant operational issues. The most critical impact is that all session-related workload (such as session creation, updates, and TTL deletions) is concentrated on a single shard, rather than being distributed across the cluster. This can cause:
      Increased write latency and resource contention on the shard holding the entire config.system.sessions collection, especially in clusters with high session churn or many connections.
      Potential for degraded cluster performance as the single shard becomes a bottleneck for session management operations
      Risk of hitting hard limits on the number of active logical sessions if expired sessions are not deleted due to TTL processes being impacted, which can block the creation of new sessions and lead to application errors or unavailability
      Why this happens
      The config.system.sessions collection is sharded and, under normal circumstances, the balancer ensures its chunks are distributed across multiple shards. When the balancer is stopped (for resharding, balancing windows, or workload management), this balancing does not occur, and the collection can become "out of balance"—meaning all or most of its data resides on a single shard
      This is particularly problematic because the sessions collection is heavily used for all session-based operations in the cluster.
      Real-world consequences
      There have been customer cases where this led to increased write latency and resource contention on the affected shard - https://jira.mongodb.org/browse/SERVER-97416
      In extreme cases, if the TTL deleter is also disabled or impacted, expired sessions may not be cleaned up, eventually blocking new session creation and causing application-level failures

            Assignee:
            Unassigned
            Reporter:
            Adi Zaimi
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: