[SERVER-40453] How to import 700+ million rows into MongoDB in minutes Created: 03/Apr/19  Updated: 03/Apr/19  Resolved: 03/Apr/19

Status: Closed
Project: Core Server
Component/s: Performance
Affects Version/s: None
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: sharad hadawale Assignee: Eric Sedor
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:

 Description   

MongoDB Version 4.0.6 for Windows 64 bit

Oracle Exadata

We have 32 Core Windows Server, 96 GB RAM with 5TB HDD

Approach 1( Using Oracle SQLLDR)

We fetched input data from oracle database.
We processed and generated multiple TSV files.
Using threading, we are importing data into the Oracle database using SQL Loader.
It requires approximately 66 Hrs.

Approach 2( Using MongoImport)

We fetched input data from oracle database.
We processed and generated multiple TSV files.
Using threading, we are importing data into a MongoDB database using mongoimport command line utility.
It requires approximately 65 Hrs.

There is no considerable difference observed in performance.

We need to process 700+ Millions of record, please suggest the better approach for optimized performance.

We are fetching from oracle database, processing in our application and storing the output in another database. This is an existing process which we do on Oracle database but it is time-consuming so we decided to try MongoDB for performance improvement.

We did one POC, where we did not get any considerable difference. We thought it may work on the server because of hardware so we did POC on the server where we got an above-mentioned result.

We think that MongoDB is more robust than the Oracle database but failed to get the desired result after comparing the stats.

Please find MongoDB related details of production server:

MongoImport Command

mongoimport --db abcDB --collection abcCollection --type tsv --file abc.tsv --headerline --numInsertionWorkers 8 --bypassDocumentValidation

Wired Tiger Configuration

 

storage:
 dbPath: C:\data\db
 journal:
 enabled: false
# engine:
# mmapv1:
 wiredTiger:
 engineConfig:
 cacheSizeGB: 40 

Approximate computation time is calculated by process log details for process execution using Oracle and process execution using MongoDB.

Underlined POC carried out on the production server is for comparing performance Oracle(SQL Loader) vs MongoDB ( MongoImport )

As we are using standalone MongoDB instance for our POC, we have not created any sharding in production server.

If we get the desired result using MongoDB, then we can come to the conclusion about migration.

Thanking you in advance.



 Comments   
Comment by Eric Sedor [ 03/Apr/19 ]

Hello,

The SERVER project is for bugs and feature requests, so we aren't able to help with this matter here. We suggest reaching out to the MongoDB community by posting on the mongodb-user group or on Stack Overflow with the mongodb tag.

Our Technical Support page may also be helpful.

Generated at Thu Feb 08 04:55:01 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.