Uploaded image for project: 'Compass '
  1. Compass
  2. COMPASS-7820

WritableCollectionStream keeps all errors in memory

    • Type: Icon: Task Task
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 1.44.0
    • Affects Version/s: None
    • Component/s: None
    • None
    • Developer Tools
    • 3
    • Not Needed
    • Iteration Big Dipper, Iteration Jupiter

      WritableCollectionStream (used when importing documents) keeps all errors in memory: https://github.com/mongodb-js/compass/blob/800f04c1898b472cd8720caf66dc1013bae18122/packages/compass-import-export/src/utils/collection-stream.ts#L119

       

      And then we only write them to the log afterwards.

       

      So if you're importing 100k documents and get 100k errors, all those errors go into memory. Better to redesign so we can stream errors straight to the log file rather than only write them at the end. Easiest and most common way to get this is to export a collection. Then every doc has an id. By default the id field has a unique index, so if you think you're gonna make a big collection by importing that same file multiple times then the second time each and every doc gets a unique key error.

       

      We can change much of this code to use async iterators. See https://github.com/mongodb-js/compass/blob/0ebcfc762a1db31001914bb7cb40616a86775936/packages/compass-import-export/src/import/import-json.ts#L117-L129

       

          await pipeline(
            [
              input,
              new Utf8Validator(),
              byteCounter,
              stripBomStream(),
              ...parserStreams,
      
              // from here on out we can just use for await of on the preceding streams, then drop collectionStream (an instance of WritableCollectionStream) entirely and just write the errors from that loop. docStream and docStatsStream can then just be normal functions
              docStream,
              docStatsStream,
              collectionStream,
            ],
            ...(abortSignal ? [{ signal: abortSignal }] : [])
          ); 

      Then drop await processWriteStreamErrors() (see https://github.com/mongodb-js/compass/blob/0ebcfc762a1db31001914bb7cb40616a86775936/packages/compass-import-export/src/import/import-json.ts#L156-L160) because we'd have written the errors throughout the import rather than keeping them until the end to be written in a second step.

       

      Do the same for import-csv and import-json.

            Assignee:
            leroux.bodenstein@mongodb.com Le Roux Bodenstein
            Reporter:
            leroux.bodenstein@mongodb.com Le Roux Bodenstein
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved: