SERVER-42723, we introduced an improved mechanism for change streams to detect the addition of new shards to the cluster. This involves opening an internal cursor on the config servers alongside the shard streams, in order to monitor for writes to the config.shards collection. While this approach guarantees that we will always correctly pick up new shards, it has the effect of making the change stream's latency dependent upon the config server write rate; since the config server cursor is treated as just another shard in the sorted stream from the ARM's perspective, the latter cannot return results from any shards until the config server writes an oplog entry which surpasses the clusterTime of those events.
In addition to the writes that the config servers perform to persist metadata changes, each mongoS in the cluster pings the config servers every 10s and every component in the cluster lockpings every 30 seconds. A large and/or busy cluster will therefore not suffer much additional latency. But in the scenario where a cluster has only a single mongoS and is not actively splitting or migrating, our latency guarantees go from a worst-case of ~10s if one of the shards in the cluster is completely inactive, to a minimum of ~10s regardless of how active the shards are.
There are a couple of ways we could address this:
- Ask Sharding if they are prepared to enable writePeriodicNoops on the config servers by default, and set it to a high frequency of maybe 1-2 seconds to minimize the impact on change streams. Each no-op entry is 103 bytes, which at even 1-second frequency is only ~362kB per hour before compression.
- Use both the old and new shard-detection mechanisms. The new approach is applicable to all cases, but only strictly necessary for (1) whole-cluster streams, and (2) streams which are opened on a database that does not yet exist. Aside from those cases, we do not need to open a cursor on the config servers at all. In the case of (2), we could also close the config.shards cursor as soon as we see the first event in the stream.
- Do nothing and consider this another flavour of activity-dependent latency.