[DRIVERS-1992] Set 0 for batchsize for initial change stream Created: 23/Nov/21  Updated: 08/Jul/22  Resolved: 08/Jul/22

Status: Closed
Project: Drivers
Component/s: Change Streams
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Tim Sedgwick Assignee: Boris Dogadov
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Issue split
split to CDRIVER-4357 Set 0 for batchsize for initial chang... Closed
split to CSHARP-4137 Set 0 for batchsize for initial chang... Closed
split to CXX-2489 Set 0 for batchsize for initial chang... Closed
split to GODRIVER-2379 Set 0 for batchsize for initial chang... Closed
split to JAVA-4573 Set 0 for batchsize for initial chang... Closed
split to MOTOR-931 Set 0 for batchsize for initial chang... Closed
split to NODE-4183 Set 0 for batchsize for initial chang... Closed
split to PHPLIB-840 Set 0 for batchsize for initial chang... Closed
split to PYTHON-3223 Set 0 for batchsize for initial chang... Closed
split to RUBY-2955 Set 0 for batchsize for initial chang... Closed
split to RUST-1266 Set 0 for batchsize for initial chang... Closed
Related
related to DRIVERS-1589 Allow changing BatchSize for getMores... Backlog
Driver Changes: Needed
Quarter: FY23Q3
Driver Compliance:
Key Status/Resolution FixVersion
CDRIVER-4357 Won't Do
CXX-2489 Won't Do
CSHARP-4137 Won't Do
GODRIVER-2379 Won't Do
JAVA-4573 Won't Do
NODE-4183 Won't Do
MOTOR-931 Won't Do
PYTHON-3223 Won't Do
PHPLIB-840 Won't Do
RUBY-2955 Won't Do
RUST-1266 Won't Do
SWIFT-1544 Won't Do

 Description   

Summary

What is the problem or use case, what are we trying to achieve?

Add option to set independent batch sizes for changestreams. This would be a new option in addition to BatchSize currently on changestream options to set the cursor batchsize on subsquent getMores.

This will allow us to set the initial aggregate batch size to 0 in the cases where the aggregation takes a large amount of time. We need to be able to set a batch size for the subsequent getMores however, or we will never return anything with a batchSize:0 for both.

 

the drivers don’t have a good way to auto-teminate a query until we have established a cursor, which only happens after the initial aggregate completes. 

 

So in a case where we abandon the initial aggregate, or where the client crashes, it just keeps running to completion on the server. If we establish a cursor with batchSize:0, then when the driver abandons the cursor it will automatically issue an explicit killCursors to the server.

For an example on the go driver, here's what we implemented on Realm's forked go driver repo: https://github.com/mongodb-forks/mongo-go-driver/pull/8/files

Motivation

Who is the affected end user?

Who are the stakeholders?

Users who use changestreams. In this specific case, users who use Realm triggers and sync.

How does this affect the end user?

Are they blocked? Are they annoyed? Are they confused?

I'll speak to how end users are impacted in the sense of triggers, but I imagine there's a larger use case here. For users who have a fairly large oplog with a match expression that is very selective, the trigger will fail while opening up the changestream because we a timeout limit we enforce outside of the driver. We utilize the collection.Watch functionality, which will attempt to open the changestream and return a cursor to us. 

In the case of a timeout on the initial opening, the driver can't kill the aggregation because a cursor wasn't established. So, the trigger will suspend, and the aggregation will continue to run on the user's cluster. The user can then restart the trigger, causing the above scenario to happen again by kicking off a new aggregation, and ultimately timing out. This is impactful to the user's cluster because the aggregation will run to completion causing unnecessary resources to be used.

How likely is it that this problem or use case will occur?

Main path? Edge case?

this is an edge case for customers who have very large oplogs.

If the problem does occur, what are the consequences and how severe are they?

Minor annoyance at a log message? Performance concern? Outage/unavailability? Failover can't complete?

Since the driver can't kill the old operations that were kicked off due to not retrieving a cursor back, the severity largely depends on how many times a changestream is retried and failed. In the triggers case, a user can restart their trigger many times which can lead to a pretty severe consequence. 

Is this issue urgent?

Does this ticket have a required timeline? What is it?

this isn't urgent since Realm has made this change in our forked go driver repo. 

Is this ticket required by a downstream team?

Needed by e.g. Atlas, Shell, Compass?

Realm

Is this ticket only for tests?

Is this ticket have any functional impact, or is it just test improvements?

this ticket is not only for tests.



 Comments   
Comment by Tim Sedgwick [ 23/Nov/21 ]

This is very similar to https://jira.mongodb.org/browse/DRIVERS-1589, although this would allow the consumer of the driver be able to set batchSize to other options besides 0 as described in DRIVERS-1589.

 

GODRIVER-1900 works if you want to set the batchSize for both the initial aggregation and the subsequent getMores. GODRIVER-1900 does not work for the case that you want to independently set this. 

Comment by Shane Harvey [ 23/Nov/21 ]

This request is very similar to DRIVERS-1589 which the Go team has already added support for. Is that problem that GODRIVER-1900 does not work for change stream cursors?

Generated at Thu Feb 08 08:24:27 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.