[DRIVERS-1589] Allow changing BatchSize for getMores; needed when BatchSize(0) uses default size Created: 01/Mar/21  Updated: 24/Feb/23

Status: Backlog
Project: Drivers
Component/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Lorenz Huelsbergen Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Cloners
clones GODRIVER-1900 Add SetBatchSize method to driver.Bat... Closed
Depends
is depended on by PYTHON-2561 Remove CommandCursor.batch_size Closed
Related
related to GODRIVER-1605 Allow cursor options to be specified ... Closed
is related to PYTHON-2563 Deprecate CommandCursor.batch_size Closed
is related to DRIVERS-1992 Set 0 for batchsize for initial chang... Closed

 Description   

Problem Statement/Rationale

Currently, when the BatchSize(0) option is used to request an immediate cursor (and no docs), the default BatchSize goes into effect for subsequent getMores and there's apparently no way to specify a custom BatchSize in this scenario. 

(This is because BatchSize is being overloaded to both set the BatchSize and to request an immediate cursor on the first aggregation.)

This is problematic currently in ADL when we need to use BatchSize(0) but the number of docs returned in subsequent batches is so large that our processing exceeds servers cursor expiry threshold.

Proposed Change/Action

Support some mechanism to set BatchSize for subsequent getMores; or barring that, provide a new mechanism to get the first cursor without any docs and leave BatchSize for specifying all batch sizes.

Expected Impact

It will provide nice flexibility and allow them to more easily do things that are difficult now.

UX Notes (Acceptance Criteria)
Have the functionality of BatchSize(0) for  requesting an immediate cursor and the ability to specify a custom BatchSize for subsequent getMores/Nexts.

 

Open Questions

If there are alternative ways of improving the situation, include that information here.

Additional Notes

Any additional information that may be useful to include.



 Comments   
Comment by Lorenz Huelsbergen [ 15/Mar/21 ]

 

(fyi- i was not receiving updates to the ticket for a bit it seems, but it looks like proper permissions have now been restored.)

as far as more context:  ADL needs to two things ideally when running an aggregation from a driver:  1) immediate access to the cursor w/o any docs; and 2) the ability to set a batch size other than the default.

point 1 is necessary given ADL's model where an ADL client query may be interrupted/canceled at any point by the client.  we do this on the Atlas side by killing a cursor.  therefore, we need the cursor immediately and cannot wait for the first batch of docs to return.

point 2 is necessary because sometimes the batch size is too large and ADL receives too many docs which cannot be processed (consider: many small writes to S3) before the Atlas cursor expires (10m i believe).  if we can set a batch size we can circumvent this.  it would be ideal to be able to set the batch size dynamically as the query progresses; however, setting it once statically should be sufficient for us to avoid this scenario.

currently, points 1 & 2 are mutually exclusive since point 1 is conveyed with a batch size of zero, but this disallows setting a batch size smaller than the default.  conversely, setting a batch size smaller than the default (but not zero) precludes getting a cursor immediately.

there are many ways to solve this on the driver side.  for example, provide a separate aggregation option to get the cursor w/o any docs and keep BatchSize for setting batch sizes other than the default.

glad to provide more info where needed; please let me know.

 

Comment by Alexander Golin (Inactive) [ 15/Mar/21 ]

lorenz.huelsbergen, driver leads are discussing in triage and are wondering if you could update this ticket with more detail about the motivation for this request. Thank you!

Comment by Jeffrey Yemin [ 09/Mar/21 ]

The request is specific to ADL's use of the Go driver to communicate with backing Atlas clusters. ADL does not need this capability in all drivers (Note that the issue was originally opened as GODRIVER-1900 and subsequently cloned here. I didn't realize that when we first triaged it). While it may make sense to expose this capability in all drivers, I don't think that the issue has not been raised often enough in the general community to justify the effort.

Furthermore, there is the possibility to expose this feature through the existing unstable BatchCursor API in the Go driver. If this is time-sensitive for ADL, I suggest that we explore that route.

Generated at Thu Feb 08 08:23:40 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.