Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 4.3.4
Affects Version/s: None
Component/s: Aggregation Framework
Labels:
- qexec-team

Backwards Compatibility:
Fully Compatible
Sprint:
Query 2019-12-02, Query 2019-12-16, Query 2019-12-30, Query 2020-01-13, Query 2020-01-27, Query 2020-02-10, Query 2020-02-24
Linked BF Score:
0
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

In ~~SERVER-42723~~, we introduced an improved mechanism for change streams to detect the addition of new shards to the cluster. This involves opening an internal cursor on the config servers alongside the shard streams, in order to monitor for writes to the config.shards collection. While this approach guarantees that we will always correctly pick up new shards, it has the effect of making the change stream's latency dependent upon the config server write rate; since the config server cursor is treated as just another shard in the sorted stream from the ARM's perspective, the latter cannot return results from any shards until the config server writes an oplog entry which surpasses the clusterTime of those events.

In addition to the writes that the config servers perform to persist metadata changes, each mongoS in the cluster pings the config servers every 10s and every component in the cluster lockpings every 30 seconds. A large and/or busy cluster will therefore not suffer much additional latency. But in the scenario where a cluster has only a single mongoS and is not actively splitting or migrating, our latency guarantees go from a worst-case of ~10s if one of the shards in the cluster is completely inactive, to a minimum of ~10s regardless of how active the shards are.

There are a couple of ways we could address this:

Ask Sharding if they are prepared to enable writePeriodicNoops on the config servers by default, and set it to a high frequency of maybe 1-2 seconds to minimize the impact on change streams. Each no-op entry is 103 bytes, which at even 1-second frequency is only ~362kB per hour before compression.

Use both the old and new shard-detection mechanisms. The new approach is applicable to all cases, but only strictly necessary for (1) whole-cluster streams, and (2) streams which are opened on a database that does not yet exist. Aside from those cases, we do not need to open a cursor on the config servers at all. In the case of (2), we could also close the config.shards cursor as soon as we see the first event in the stream.

Do nothing and consider this another flavour of activity-dependent latency.

is related to

SERVER-42723 New shard with new database can be ignored by change streams

Closed

SERVER-80427 Avoid change streams latency caused by lack of writes on a shard

Backlog

Assignee:: Bernard Gorman
Reporter:: Bernard Gorman
Participants:: Bernard Gorman, Githook User, Kaloian Manassiev
Votes:: 0 Vote for this issue
Watchers:: 16 Start watching this issue

Created:: Nov 01 2019 10:37:29 AM UTC
Updated:: Jan 08 2024 03:23:00 PM UTC
Resolved:: Feb 16 2020 03:06:29 AM UTC
Confidence Status Last Update:: 29/Jan/20 11:54 PM

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates