Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-40453

How to import 700+ million rows into MongoDB in minutes

    • Type: Icon: Question Question
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Performance
    • Labels:
      None

      MongoDB Version 4.0.6 for Windows 64 bit

      Oracle Exadata

      We have 32 Core Windows Server, 96 GB RAM with 5TB HDD

      Approach 1( Using Oracle SQLLDR)

      We fetched input data from oracle database.
      We processed and generated multiple TSV files.
      Using threading, we are importing data into the Oracle database using SQL Loader.
      It requires approximately 66 Hrs.

      Approach 2( Using MongoImport)

      We fetched input data from oracle database.
      We processed and generated multiple TSV files.
      Using threading, we are importing data into a MongoDB database using mongoimport command line utility.
      It requires approximately 65 Hrs.

      There is no considerable difference observed in performance.

      We need to process 700+ Millions of record, please suggest the better approach for optimized performance.

      We are fetching from oracle database, processing in our application and storing the output in another database. This is an existing process which we do on Oracle database but it is time-consuming so we decided to try MongoDB for performance improvement.

      We did one POC, where we did not get any considerable difference. We thought it may work on the server because of hardware so we did POC on the server where we got an above-mentioned result.

      We think that MongoDB is more robust than the Oracle database but failed to get the desired result after comparing the stats.

      Please find MongoDB related details of production server:

      MongoImport Command

      mongoimport --db abcDB --collection abcCollection --type tsv --file abc.tsv --headerline --numInsertionWorkers 8 --bypassDocumentValidation

      Wired Tiger Configuration

       

      storage:
       dbPath: C:\data\db
       journal:
       enabled: false
      # engine:
      # mmapv1:
       wiredTiger:
       engineConfig:
       cacheSizeGB: 40 
      

      Approximate computation time is calculated by process log details for process execution using Oracle and process execution using MongoDB.

      Underlined POC carried out on the production server is for comparing performance Oracle(SQL Loader) vs MongoDB ( MongoImport )

      As we are using standalone MongoDB instance for our POC, we have not created any sharding in production server.

      If we get the desired result using MongoDB, then we can come to the conclusion about migration.

      Thanking you in advance.

            Assignee:
            eric.sedor@mongodb.com Eric Sedor
            Reporter:
            sharad9606@gmail.com sharad hadawale
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: