[SERVER-32013] Ensure consistent getMore polling behaviour for sharded $changeStream Created: 17/Nov/17  Updated: 26/Jul/23

Status: Backlog
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Bernard Gorman Assignee: Backlog - Query Execution
Resolution: Unresolved Votes: 0
Labels: changestreams
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Query Execution
Backwards Compatibility: Fully Compatible
Participants:

 Description   

Sharded $changeStream requires a regular stream of updates from each shard in order to return results to the client as soon as possible. To achieve this, we poll each shard with getMores until it returns a batch of data. If we request a getMore for a particular shard and it returns an empty batch after timing out, the ARM will automatically reschedule additional getMores until a valid batch is received.

However, there are two circumstances in which polling of shards 'stalls':

  • If we are in kGetMoreWithAtLeastOneResultInBatch context and we hit !_arm.ready(), we return immediately without kicking off a new round of getMores on the shard(s) whose buffers have been exhausted.
  • For kInitialFind we always return an empty batch immediately regardless of maxTimeMS or batchSize, and do not schedule any shard getMores until the following client getMore is received.

Previously we attempted to extend the ARM's existing auto-reschedule behaviour for these scenarios, but this required scheduling getMores in the ARM constructor and in nextReady, neither of which was appropriate. However, the same behaviour can be appropriately (and trivially) achieved in RouterStageMerge, by simply scheduling and stashing an ARM event just before we return EOF from kInitialFind or kGetMoreWithAtLeastOneResultInBatch context. We already use this approach when we time out while waiting for results in kGetMoreNoResultsYet context.


Generated at Thu Feb 08 04:28:53 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.