ISSUE DESCRIPTION AND IMPACT
This bug prevents config server replica set primaries from creating new sessions, resulting in a loss of availability of the node after an internal limit of 1 million sessions are created. Unfortunately, this limit is eventually reached independent of the number of active sessions at any given point.
The impact occurs because the bug prevents the node from refreshing its in-memory logical session cache from the persisted config.system.sessions collection. This failure to refresh prevents the lastUsed TTL index on config.system.sessions from removing session records, and makes reaching 1 million in-memory sessions likely.
Ultimately, the underlying cause is that TTL index creation failure on shards that do not contain config.sessions chunks ends up halting the synchronization process that refreshes the in-memory logical session cache.
DIAGNOSIS AND AFFECTED VERSIONS
MongoDB versions 4.2.0 to 4.2.5 are impacted by this bug. Signs that the issue is occurring include:
- Sharding commands fail
- Chunk migrations fail
- Inability to access the config server primary
REMEDIATION AND WORKAROUNDS
To remediate loss of availability, kill and re-start the config server primary and allow replica set failover to reset the session cache on the new primary.
Setting maxSessions to a higher number than the default of 1 million can delay the onset of this issue.