Uploaded image for project: 'C# Driver'
  1. C# Driver
  2. CSHARP-2385

Support lazy enumeration in BulkWrite

    • Type: Icon: Improvement Improvement
    • Resolution: Won't Fix
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 2.7.0
    • Component/s: Write Operations
    • Labels:
      None

      The BulkWrite method takes an IEnumerable for the bulk write requests. There are two issues currently with how the driver handles that IEnumberable. The first is that the IEnumerable is iterated multiple times. That will be covered in scope of CSHARP-1378. The second is that it eagerly iterates the IEnumerable instead of doing so lazily, which would support streaming use cases where the IEnumerable is computed lazily. We will consider this second issue as the scope of this Jira issue, and as such no longer will treat it as a duplicate of CSHARP-1378.

      Original Description

      The BulkWrite(Async) methods accept an IEnumerable which, unfortunately, gets enumerated multiple times. Here is the code:

      public override BulkWriteResult<TDocument> BulkWrite(IEnumerable<WriteModel<TDocument>> requests, BulkWriteOptions options, CancellationToken cancellationToken)
       {
       Ensure.IsNotNull(requests, nameof(requests));
       if (!requests.Any()) // THIS STARTS TO ENUMERATE
       {
       throw new ArgumentException("Must contain at least 1 request.", "requests");
       }
       options = options ?? new BulkWriteOptions();
      var operation = CreateBulkWriteOperation(requests, options);
       try
       {
       var result = ExecuteWriteOperation(operation, cancellationToken); // THIS ENUMERATES ALL THE WAY
       return BulkWriteResult<TDocument>.FromCore(result, requests); // THIS ENUMERATES ALL THE WAY
       }
       catch (MongoBulkWriteOperationException ex)
       {
       throw MongoBulkWriteException<TDocument>.FromCore(ex, requests.ToList());
       }
       } 
      

       
      This is not a problem if someone passes in something like a List<TDocument> or a TDocument[]. If, however, someone passes a lazily enumerated IEnumerable (as in a LINQ query or something that uses yield) into any of the BulkWrite methods we get a number of issues: The enumerator of the passed IEnumerable gets created and disposed three times. Also, n * 2 + 1 items (the +1 is for the .Any() call) get requested from (and potentially created freshly by) the enumerator which may result in costly read operations and or surprising side effects. This is unnecessarily slow and results in the fact that streaming patterns cannot be implemented.

      So, I suggest to
      a) either change the API to accept a ReadOnlyCollection<TDocument> instead of the IEnumerable<TDocument> which would clarify things (but it would be a breaking change) or b) optimize the data flow so the passed in IEnumerable does not get evaluated more than once.

            Assignee:
            Unassigned Unassigned
            Reporter:
            daniel.hegener@gmx.net Daniel Hegener
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: