[SERVER-44356] Change Stream latency is determined by config server write rate Created: 01/Nov/19 Updated: 08/Jan/24 Resolved: 16/Feb/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework |
| Affects Version/s: | None |
| Fix Version/s: | 4.3.4 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Bernard Gorman | Assignee: | Bernard Gorman |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | qexec-team | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Sprint: | Query 2019-12-02, Query 2019-12-16, Query 2019-12-30, Query 2020-01-13, Query 2020-01-27, Query 2020-02-10, Query 2020-02-24 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Linked BF Score: | 0 | ||||||||||||||||
| Description |
|
In In addition to the writes that the config servers perform to persist metadata changes, each mongoS in the cluster pings the config servers every 10s and every component in the cluster lockpings every 30 seconds. A large and/or busy cluster will therefore not suffer much additional latency. But in the scenario where a cluster has only a single mongoS and is not actively splitting or migrating, our latency guarantees go from a worst-case of ~10s if one of the shards in the cluster is completely inactive, to a minimum of ~10s regardless of how active the shards are. There are a couple of ways we could address this:
|
| Comments |
| Comment by Githook User [ 16/Feb/20 ] |
|
Author: {'username': 'gormanb', 'name': 'Bernard Gorman', 'email': 'bernard.gorman@gmail.com'}Message: |
| Comment by Kaloian Manassiev [ 28/Jan/20 ] |
|
We just had a meeting about this with jack.mulrow, esha.maharishi and renctan: Jack pointed out to me that the gossiped configOpTime is actually the majority committed opTime on the config server and not the last written, like the clusterTime. As a consequence of this, it means that reads from the config server (which are w:majority, afterOpTime: configOpTime) are less likely to block waiting on the opTime to become majority written. This in turn means that enabling the NoopWriter on the config server is actually not as problematic as I originally thought. |
| Comment by Bernard Gorman [ 10/Jan/20 ] |
|
kaloian.manassiev: I'm not sure that will address the original problem, as outlined in How will the new clusterTopologyOpTime help us to avoid this scenario? |
| Comment by Kaloian Manassiev [ 08/Jan/20 ] |
|
Apologies for the delay in getting back to this. I didn't realise it had impact on driver testing as well and I thought it was about performance only. Enabling the NoopWriter on the config server will introduce unnecessary waits for metadata reads, which in turn can have negative and possibly unpredictable impact on the latency of CRUD operations, which end up having to do routing info refresh. One such example is that let's say a migration commits (with majority) and produces an opTime 100, but before it returns to the caller shard, the NoopWriter advances it up to 101. Now the refresh needs to wait for 101 to become majority committed instead of 100. Because of this I am hesitant just making this change. Instead, what we are proposing to do is to introduce a clusterTopologyOpTime vector clock component, which gets advanced only when shards are added and removed. This, combined with the causal consistency provided for reads against the metadata, will ensure that if any chunk is removed to a newly added shard, any change streams will see the newly added shard when they see the chunk migration. This is not something which is quick to do though and will take at least a couple of iterations. |