[SERVER-39410] Re-enable batching in DSCursor for change stream cursors Created: 07/Feb/19  Updated: 29/Oct/23  Resolved: 21/Feb/19

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: 4.0.7, 4.1.9

Type: Improvement Priority: Major - P3
Reporter: Bernard Gorman Assignee: Bernard Gorman
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Duplicate
is duplicated by SERVER-38942 Improve robustness of postBatchResume... Closed
Problem/Incident
Related
Backwards Compatibility: Fully Compatible
Backport Requested:
v4.0
Sprint: Query 2019-02-25
Participants:
Linked BF Score: 0

 Description   

Previously, for needsMerge:false streams on a single replica set, we permitted our results to be batched at the DocumentSourceCursor level; in other words, we would suck up to 4MB of data into the cursor before starting to pull those oplog events through the pipeline. We could do this because we did not need to track the oplog timestamp for single-replica streams. By contrast, for needsMerge:true streams that were producing data to be merged on mongoS, we had to include the $_internalLatestOplogTimestamp field. We therefore could not allow any batching, because pulling in 4MB of oplog events before starting to process the first of them would cause the latest oplog timestamp to jump 4MB ahead of the event that was actually being processed. Instead, we force the oplog scan to yield after every document, so the batch size is effectively 1 and the latest oplog timestamp stays in sync with the event being processed.

But for the change stream high water mark project, streams on a single replica set must also track the latest oplog timestamp; we need it in order to generate a high water mark token. And so this change within the SERVER-38408 commit extends the "no batching" rule to single replica streams. The upshot of this is that the stream is slower to return results (because we're yielding after every document) and more variable (because we are no longer guaranteed to return all available results up to 4MB on each getMore). This is also the underlying cause of SERVER-38942.

We should allow both sharded and unsharded change stream cursors to batch their results, in order to improve the latency and consistency with which change stream events are provided to the client.



 Comments   
Comment by Githook User [ 01/Mar/19 ]

Author:

{'name': 'Bernard Gorman', 'email': 'bernard.gorman@gmail.com', 'username': 'gormanb'}

Message: SERVER-39410 Re-enable batching in DSCursor for change stream cursors

(cherry picked from commit 04882fa7f5210cfb14918ecddbbc5acbd88e86b6)
Branch: v4.0
https://github.com/mongodb/mongo/commit/204d63a92d588b9891277caf70a257b42f82ac32

Comment by Githook User [ 21/Feb/19 ]

Author:

{'name': 'Bernard Gorman', 'username': 'gormanb', 'email': 'bernard.gorman@gmail.com'}

Message: SERVER-39410 Re-enable batching in DSCursor for change stream cursors
Branch: master
https://github.com/mongodb/mongo/commit/04882fa7f5210cfb14918ecddbbc5acbd88e86b6

Generated at Thu Feb 08 04:51:59 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.