-
Type:
Improvement
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Cluster Scalability
-
Cluster Scalability Priorities
-
3
-
None
-
None
-
None
-
None
-
None
-
None
-
None
The ask in this ticket is to always balance config.system.sessions even when balancer is stopped.
Customer problem:
When the balancer is stopped, the config.system.sessions collection is not balanced, which can lead to significant operational issues. The most critical impact is that all session-related workload (such as session creation, updates, and TTL deletions) is concentrated on a single shard, rather than being distributed across the cluster. This can cause:
Increased write latency and resource contention on the shard holding the entire config.system.sessions collection, especially in clusters with high session churn or many connections.
Potential for degraded cluster performance as the single shard becomes a bottleneck for session management operations
Risk of hitting hard limits on the number of active logical sessions if expired sessions are not deleted due to TTL processes being impacted, which can block the creation of new sessions and lead to application errors or unavailability
Why this happens
The config.system.sessions collection is sharded and, under normal circumstances, the balancer ensures its chunks are distributed across multiple shards. When the balancer is stopped (for resharding, balancing windows, or workload management), this balancing does not occur, and the collection can become "out of balance"—meaning all or most of its data resides on a single shard
This is particularly problematic because the sessions collection is heavily used for all session-based operations in the cluster.
Real-world consequences
There have been customer cases where this led to increased write latency and resource contention on the affected shard - https://jira.mongodb.org/browse/SERVER-97416
In extreme cases, if the TTL deleter is also disabled or impacted, expired sessions may not be cleaned up, eventually blocking new session creation and causing application-level failures
- is related to
-
SERVER-97416 Increase the aggressiveness of balancing config.system.sessions
-
- Backlog
-