Loading...

Type: Improvement
Resolution: Won't Do
Priority: Major - P3
Fix Version/s: None
Component/s: Change Streams
Labels:
None

Driver Changes:
Needed
Quarter:
- FY23Q3

Driver Compliance:

$i18n.getText("admin.common.words.hide")

Key	Status/Resolution	FixVersion
CDRIVER-4357	Won't Do
CXX-2489	Won't Do
CSHARP-4137	Won't Do
GODRIVER-2379	Won't Do
JAVA-4573	Won't Do
NODE-4183	Won't Do
MOTOR-931	Won't Do
PYTHON-3223	Won't Do
PHPLIB-840	Won't Do
RUBY-2955	Won't Do
RUST-1266	Won't Do
SWIFT-1544	Won't Do

$i18n.getText("admin.common.words.show")

#scriptField, #scriptField *{ border: 1px solid black; } #scriptField{ border-collapse: collapse; } #scriptField td { text-align: center; /* Center-align text in table cells */ } #scriptField td.key { text-align: left; /* Left-align text in the Key column */ } #scriptField a { text-decoration: none; /* Remove underlines from links */ border: none; /* Remove border from links */ } /* Add green background color to cells with FixVersion */ #scriptField td.hasFixVersion { background-color: #00FF00; /* Green color code */ } #scriptField td.willNotDo { background-color: #FF0000; /* Red color code */ } /* Center-align the first row headers */ #scriptField th { text-align: center; } Key Status/Resolution FixVersion CDRIVER-4357 Won't Do CXX-2489 Won't Do CSHARP-4137 Won't Do GODRIVER-2379 Won't Do JAVA-4573 Won't Do NODE-4183 Won't Do MOTOR-931 Won't Do PYTHON-3223 Won't Do PHPLIB-840 Won't Do RUBY-2955 Won't Do RUST-1266 Won't Do SWIFT-1544 Won't Do

Summary

What is the problem or use case, what are we trying to achieve?

Add option to set independent batch sizes for changestreams. This would be a new option in addition to BatchSize currently on changestream options to set the cursor batchsize on subsquent getMores.

This will allow us to set the initial aggregate batch size to 0 in the cases where the aggregation takes a large amount of time. We need to be able to set a batch size for the subsequent getMores however, or we will never return anything with a batchSize:0 for both.

the drivers don’t have a good way to auto-teminate a query until we have established a cursor, which only happens after the initial aggregate completes.

So in a case where we abandon the initial aggregate, or where the client crashes, it just keeps running to completion on the server. If we establish a cursor with batchSize:0, then when the driver abandons the cursor it will automatically issue an explicit killCursors to the server.

For an example on the go driver, here's what we implemented on Realm's forked go driver repo: https://github.com/mongodb-forks/mongo-go-driver/pull/8/files

Motivation

Who is the affected end user?

Who are the stakeholders?

Users who use changestreams. In this specific case, users who use Realm triggers and sync.

How does this affect the end user?

Are they blocked? Are they annoyed? Are they confused?

I'll speak to how end users are impacted in the sense of triggers, but I imagine there's a larger use case here. For users who have a fairly large oplog with a match expression that is very selective, the trigger will fail while opening up the changestream because we a timeout limit we enforce outside of the driver. We utilize the collection.Watch functionality, which will attempt to open the changestream and return a cursor to us.

In the case of a timeout on the initial opening, the driver can't kill the aggregation because a cursor wasn't established. So, the trigger will suspend, and the aggregation will continue to run on the user's cluster. The user can then restart the trigger, causing the above scenario to happen again by kicking off a new aggregation, and ultimately timing out. This is impactful to the user's cluster because the aggregation will run to completion causing unnecessary resources to be used.

How likely is it that this problem or use case will occur?

Main path? Edge case?

this is an edge case for customers who have very large oplogs.

If the problem does occur, what are the consequences and how severe are they?

Minor annoyance at a log message? Performance concern? Outage/unavailability? Failover can't complete?

Since the driver can't kill the old operations that were kicked off due to not retrieving a cursor back, the severity largely depends on how many times a changestream is retried and failed. In the triggers case, a user can restart their trigger many times which can lead to a pretty severe consequence.

Is this issue urgent?

Does this ticket have a required timeline? What is it?

this isn't urgent since Realm has made this change in our forked go driver repo.

Is this ticket required by a downstream team?

Needed by e.g. Atlas, Shell, Compass?

Realm

Is this ticket only for tests?

Is this ticket have any functional impact, or is it just test improvements?

this ticket is not only for tests.

related to

DRIVERS-1589 Allow changing BatchSize for getMores; needed when BatchSize(0) uses default size

Backlog

split to

CDRIVER-4357 Set 0 for batchsize for initial change stream

Closed

CSHARP-4137 Set 0 for batchsize for initial change stream

Closed

CXX-2489 Set 0 for batchsize for initial change stream

Closed

GODRIVER-2379 Set 0 for batchsize for initial change stream

Closed

JAVA-4573 Set 0 for batchsize for initial change stream

Closed

MOTOR-931 Set 0 for batchsize for initial change stream

Closed

NODE-4183 Set 0 for batchsize for initial change stream

Closed

PHPLIB-840 Set 0 for batchsize for initial change stream

Closed

PYTHON-3223 Set 0 for batchsize for initial change stream

Closed

RUBY-2955 Set 0 for batchsize for initial change stream

Closed

RUST-1266 Set 0 for batchsize for initial change stream

Closed

(7 split to)

Details

Description

Summary

Motivation

Who is the affected end user?

How does this affect the end user?

How likely is it that this problem or use case will occur?

If the problem does occur, what are the consequences and how severe are they?

Is this issue urgent?

Is this ticket required by a downstream team?

Is this ticket only for tests?

Attachments

Issue Links

Forms

Activity

People

Dates