[SERVER-30784] Allow sharded change streams to target just the shards that have chunks Created: 22/Aug/17  Updated: 28/Aug/23

Status: Backlog
Project: Core Server
Component/s: Querying, Replication, Sharding
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Spencer Brody (Inactive) Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 2
Labels: change-streams-improvements, changestreams
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-42290 Target change streams to the primary ... Backlog
is related to SERVER-80427 Avoid change streams latency caused b... Backlog
Assigned Teams:
Query Optimization
Participants:
Case:

 Description   

Currently change streams on sharded collections target all shards in the cluster, regardless of which shards actually have data for that collection. In order for change streams to target just the shards that have chunks, there are a few things we need.
First we need to be able to reconstruct the routing table at any point in the past, so that when we resume a stream we can know which shards had chunks at the time that the stream is resuming from.
Second, we need chunk migration commit to be a multi-document transaction. If the donor shard recorded a chunk migration commit operation with the same optime as the chunk migration commit operation on the config server, then when a change stream encounters that commit operation on the donor shard, it could use the multi-version routing table to tell whether the recipient shard had any other chunks for that collection as of the time of the commit and if not it could close its change stream cursor to force mongos to retarget. We could then remove the no-op oplog entry we currently log on the donor shard when we migrate a chunk to a shard that has no chunks for that collection.



 Comments   
Comment by Spencer Brody (Inactive) [ 23/Aug/17 ]

It might be possible to do this without making chunk migration commit a true multi-doc transaction with the same commit time on all involved parties (donor, recipient, config server). Currently we log an oplog entry on the donor shard when entering the critical section for a migration that moves a chunk to a shard that has no chunks for that collection. The thing that prevents us from targeting just the shards that have chunks right now is that when we encounter that oplog entry and send a retryNeeded notification to the mongos, at that point the migration still might not have committed, so the mongos may reload its routing table but still not discover that it needs to add a cursor to the new shard. To fix this we could make it so that the change stream on the donor shard blocks when it encounters the oplog entry from the beginning of the critical section until it becomes aware that that migration has committed. Then only after determining that the migration is committed it can return the 'retryNeeded' notification to the mongos, using the timestamp from the oplog entry at the beginning of the critical section as the resumeToken to ensure nothing gets missed on the new shard. Waiting until the donor shard observes that the migration is committed means that the cluster time of the response will be after the migration commit, and thus when the mongos reloads the routing table it is guaranteed to pick up the changes from the migration commit.

Comment by Spencer Brody (Inactive) [ 22/Aug/17 ]

Once we do this we can stop making sharded change steams start at the timestamp of when the ShardRegistry was last reloaded and can instead rely on the chunk versioning protocol to ensure we have targeted the right set of shards.

Generated at Thu Feb 08 04:25:00 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.