When the TTL Monitor encounters a StaleConfig error indicating that the sharding metadata for a collection needs to be recovered, it spawns an async thread to execute that recovery and then moves on to the next collection. On clusters with many collections with TTL indexes, this can spawn a large number of threads, particularly during startup, where the sharding metadata is unknown for all collections. This can cause resource exhaustion due to the number of threads/memory, an also thundering heard effects on the configsvr handling the metadata refreshes. We should limit the amount of threads that the TTL Monitor can start for sharding metadata recovery.
- is caused by
-
SERVER-63245 TTL Monitor thread doesn't recover the shard version
-
- Closed
-