[SERVER-48526] Lengthy oplog scans may cause difficulty resuming change streams in a sharded cluster Created: 01/Jun/20  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Bernard Gorman Assignee: Backlog - Query Execution
Resolution: Unresolved Votes: 0
Labels: change-streams-improvements
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Assigned Teams:
Query Execution
Participants:
Case:

 Description   

If a change stream is resumed in a sharded cluster, all shards that have no oplog entries for the relevant namespace (either because they never had any data for the collection, or because they have not taken any writes recently) will scan through their oplogs until reaching EOF before returning their first getMore batch to mongoS. If the oplogs are large and the requested resume point is early, then this may take so long that the cursors on shards which do have oplog entries time out, effectively making the stream unresumable. This problem could manifest even after SERVER-30784 is complete, although the likelihood of such an occurrence will be significantly reduced.

Addressing this may require changing the semantics of tailable-awaitData getMores on mongoD. Currently, if there are no matching entries, getMore will scan to the end of the oplog regardless of how long it takes. We could instead have it proactively break and return after the specific awaitData period expires, much like a non-throwing maxTimeMS deadline. This would also be consistent with mongoS' current behaviour.


Generated at Thu Feb 08 05:17:23 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.