Using InsertBatch with a very large dataset (~300MB CSV)

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Done
    • Priority: Blocker - P1
    • None
    • Affects Version/s: 1.8.1
    • Component/s: None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      I'm trying to use InsertBatch to process a ~300MB file. I use File.Readlines to enumerate the lines in the file and turn them into BsonDocuments. I pass the LINQ enumerator to InsertBatch - it just hangs. It works fine with a small file.

      It does the same thing if I manually batch by using Skip & Take on my LINQ query which is pulling lines out of the file. If I take the first 5000 lines and pass them to InsertBatch the first call works and the second 5000 batch just hangs.

      I thought it might be something to do with one of these
      https://jira.mongodb.org/browse/CSHARP-717
      https://jira.mongodb.org/browse/CSHARP-715

      but am pretty stuck to be honest. Help!

      code snippet:

      var postcodesToInsert = File
      .ReadLines(csvFile)
      .Select(
      postcode =>
      new BsonDocument
      {

      { "Postcode", postcode.Split(',')[0].Replace(" ", string.Empty) .Replace("\"", string.Empty) }

      });

      postcodesCollection.InsertBatch(postcodesToInsert);

      thanks
      Nic

            Assignee:
            Robert Stam (Inactive)
            Reporter:
            Nic Pillinger
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: