[SERVER-42290] Target change streams to the primary shard when possible Created: 19/Jul/19  Updated: 28/Aug/23

Status: Backlog
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Bernard Gorman Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 2
Labels: change-streams-improvements
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-78321 MongoDB 6.0: Adding a new shard rende... Closed
is related to SERVER-30784 Allow sharded change streams to targe... Backlog
is related to SERVER-80427 Avoid change streams latency caused b... Backlog
Assigned Teams:
Query Optimization
Sprint: Query 2021-01-25, Query Optimization 2021-05-03
Participants:
Case:

 Description   

When creating a change stream on an unsharded collection, we currently open cursors on all shards rather than just the primary shard. We do so because, at the point where we are deciding upon a routing strategy, we have no insight into whether the operation is creating a new stream or resuming from a point in the past; in order to cover all possible scenarios, we must therefore assume the latter. This in turn means that we may be resuming from a time when the namespace existed in an earlier incarnation as a sharded collection which has since been dropped, and because we do not have access to a historical view of the routing table, we must again assume that this is the case. This defensive series of assumptions lets us open streams which work across all potential use-cases, at the cost of significant inefficiency and some unfortunate edge-cases (e.g. SERVER-42232) in the case of unsharded collections.

But by far the more likely scenario is that we are starting or resuming a stream on an unsharded collection which does currently exist, in which case opening streams on every shard is wasteful and opens the possibility of complications such as SERVER-42232. We already resolve the UUID of unsharded collections as part of the aggregation processing code, so by providing a means for the routing logic to examine the stream's resume information, it should be possible to target the primary shard in cases where we are opening a new stream or resuming a stream on an existing unsharded collection.


Generated at Thu Feb 08 05:00:06 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.