[CSHARP-2385] Support lazy enumeration in BulkWrite Created: 13/Sep/18  Updated: 08/Feb/23  Resolved: 14/Dec/20

Status: Closed
Project: C# Driver
Component/s: Write Operations
Affects Version/s: 2.7.0
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Daniel Hegener Assignee: Unassigned
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to CSHARP-1378 BulkWrite enumerates requests argumen... Backlog

 Description   

The BulkWrite method takes an IEnumerable for the bulk write requests. There are two issues currently with how the driver handles that IEnumberable. The first is that the IEnumerable is iterated multiple times. That will be covered in scope of CSHARP-1378. The second is that it eagerly iterates the IEnumerable instead of doing so lazily, which would support streaming use cases where the IEnumerable is computed lazily. We will consider this second issue as the scope of this Jira issue, and as such no longer will treat it as a duplicate of CSHARP-1378.

Original Description

The BulkWrite(Async) methods accept an IEnumerable which, unfortunately, gets enumerated multiple times. Here is the code:

public override BulkWriteResult<TDocument> BulkWrite(IEnumerable<WriteModel<TDocument>> requests, BulkWriteOptions options, CancellationToken cancellationToken)
 {
 Ensure.IsNotNull(requests, nameof(requests));
 if (!requests.Any()) // THIS STARTS TO ENUMERATE
 {
 throw new ArgumentException("Must contain at least 1 request.", "requests");
 }
 options = options ?? new BulkWriteOptions();
var operation = CreateBulkWriteOperation(requests, options);
 try
 {
 var result = ExecuteWriteOperation(operation, cancellationToken); // THIS ENUMERATES ALL THE WAY
 return BulkWriteResult<TDocument>.FromCore(result, requests); // THIS ENUMERATES ALL THE WAY
 }
 catch (MongoBulkWriteOperationException ex)
 {
 throw MongoBulkWriteException<TDocument>.FromCore(ex, requests.ToList());
 }

 
This is not a problem if someone passes in something like a List<TDocument> or a TDocument[]. If, however, someone passes a lazily enumerated IEnumerable (as in a LINQ query or something that uses yield) into any of the BulkWrite methods we get a number of issues: The enumerator of the passed IEnumerable gets created and disposed three times. Also, n * 2 + 1 items (the +1 is for the .Any() call) get requested from (and potentially created freshly by) the enumerator which may result in costly read operations and or surprising side effects. This is unnecessarily slow and results in the fact that streaming patterns cannot be implemented.

So, I suggest to
a) either change the API to accept a ReadOnlyCollection<TDocument> instead of the IEnumerable<TDocument> which would clarify things (but it would be a breaking change) or b) optimize the data flow so the passed in IEnumerable does not get evaluated more than once.



 Comments   
Comment by Jeffrey Yemin [ 15/Dec/20 ]

Just to clarify: I updated the description to reflect the two aspects of the issue, and so this is no longer considered a duplicate of CSHARP-1378. Still, we've decided for now to close this issue as Won't Fix, as there is implementation complexity in supporting an "infinite" stream of write requests, and we don't think the complexity is warranted given the limited use cases for the feature. Also, no other MongoDB driver supports this use case, so it would make .NET somewhat of an outlier and may limit our flexibility going forward if we were to support this.

For applications that require lazy enumeration, it's always an option to implement it on top of the driver by batching requests into a series of BulkWrite operations. This has the benefit of giving applications greater control of batch sizes, which in the driver are hard coded to 1000 requests per batch.

Comment by Dmitry Lukyanov (Inactive) [ 14/Dec/20 ]

Supporting lazy enumerating for BulkWrite is not in our road map for now.

Comment by Christopher Lombardi [ 07/Nov/19 ]

Despite this being considered a, "duplicate" can we receive a response regarding why the implementation enumerates the entire collection and does not stream and batch during streaming?

Comment by Jeffrey Yemin [ 22/Oct/18 ]

Closing as a duplicate of CSHARP-1378.

Generated at Wed Feb 07 21:42:24 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.