Uploaded image for project: 'C# Driver'
  1. C# Driver
  2. CSHARP-741

Using InsertBatch with a very large dataset (~300MB CSV)

    XMLWordPrintableJSON

Details

    • Icon: Task Task
    • Resolution: Done
    • Icon: Blocker - P1 Blocker - P1
    • None
    • 1.8.1
    • None

    Description

      I'm trying to use InsertBatch to process a ~300MB file. I use File.Readlines to enumerate the lines in the file and turn them into BsonDocuments. I pass the LINQ enumerator to InsertBatch - it just hangs. It works fine with a small file.

      It does the same thing if I manually batch by using Skip & Take on my LINQ query which is pulling lines out of the file. If I take the first 5000 lines and pass them to InsertBatch the first call works and the second 5000 batch just hangs.

      I thought it might be something to do with one of these
      https://jira.mongodb.org/browse/CSHARP-717
      https://jira.mongodb.org/browse/CSHARP-715

      but am pretty stuck to be honest. Help!

      code snippet:

      var postcodesToInsert = File
      .ReadLines(csvFile)
      .Select(
      postcode =>
      new BsonDocument
      {

      { "Postcode", postcode.Split(',')[0].Replace(" ", string.Empty) .Replace("\"", string.Empty) }

      });

      postcodesCollection.InsertBatch(postcodesToInsert);

      thanks
      Nic

      Attachments

        Activity

          People

            robert@mongodb.com Robert Stam
            nic_lsf Nic Pillinger
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: