[JAVA-3810] Performance bottleneck in bulk insertion Created: 10/Aug/20  Updated: 27/Oct/23  Resolved: 18/Aug/20

Status: Closed
Project: Java Driver
Component/s: None
Affects Version/s: 3.12.6
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Aziz Zitouni Assignee: Ross Lawley
Resolution: Gone away Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

-Server Specs

Processors: 2 x Intel Xeon E5-2640 2.50GHz
Memory: 8GB RDIMM, 1333 MH (Total 32Gb RAM)
Network Card Speed: Broadcom 5720 QP 1Gb Network Daughter Card
Operating System: Core OS
MongoDB Server Version: 3.6.2 (Docker hosted)

-Client Specs

Processors: Intel Core i7-4790 CPU @ 3.60GHz (8CPUs). ~3.1GHz
Memory: 16GB RAM
Network Card: IntelĀ® Ethernet Connection (2) I218-V, 1Gb
Operating System: Windows Server 2012 R2 Standard

Average data transfer rate: 200 MB/s



 Description   

Sample code:

public static void bulkInsert() {
		MongoClient mongoClient = new MongoClient(new MongoClientURI("mongodb://192.168.140.129:27017/"));
		WriteConcern wc = new WriteConcern(0).withJournal(false);
 
		String databaseName = "test";
		String collectionName = "testCollection";
 
		System.out.println("Database: " + databaseName);
		System.out.println("Collection: " + collectionName);
		System.out.println("Write concern: " + wc);
 
		MongoDatabase database = mongoClient.getDatabase(databaseName);
 
		MongoCollection<Document> collection = database.getCollection(collectionName).withWriteConcern(wc);
 
		int rows = 1000000;
		int iterations = 5;
 
		double accTime = 0;
 
		for (int it = 0; it < iterations; it++) {
			database.drop();
 
			List<InsertOneModel<Document>> docs = new ArrayList<>();
 
			int batchSize = 1000;
			int batch = 0;
 
			long start = System.currentTimeMillis();
 
			for (int i = 0; i < rows; ++i) {
				String key1 = "7";
				String key2 = "8395829";
				String key3 = "928749";
				String key4 = "9";
				String key5 = "28";
				String key6 = "44923.59";
				String key7 = "0.094";
				String key8 = "0.29";
				String key9 = "e";
				String key10 = "r";
				String key11 = "2020-03-16";
				String key12 = "2020-03-16";
				String key13 = "2020-03-16";
				String key14 = "klajdlfaijdliffna";
				String key15 = "933490";
				String key17 = "paorgpaomrgpoapmgmmpagm";
 
				Document doc = new Document("key17", key17).append("key12", key12).append("key7", key7)
						.append("key6", key6).append("key4", key4).append("key10", key10).append("key1", key1)
						.append("key2", key2).append("key5", key5).append("key13", key13).append("key9", key9)
						.append("key11", key11).append("key14", key14).append("key15", key15).append("key3", key3)
						.append("key8", key8);
 
				docs.add(new InsertOneModel<>(doc));
 
				batch++;
 
				if (batch >= batchSize) {
					collection.bulkWrite(docs);
					docs.clear();
					batch = 0;
				}
			}
 
			if (batch > 0) {
				collection.bulkWrite(docs);
				docs.clear();
			}
 
			long end = System.currentTimeMillis();
			double elapsedSecs = (end - start) / 1000.0;
 
			accTime += elapsedSecs;
 
			System.out.println("Iteration " + it + " - Elapsed: " + elapsedSecs + " seconds.");
		}
 
		System.out.println("Avg: " + (accTime / iterations) + " seconds.");
		
		mongoClient.close();
	}

The performance of bulk insertion does not improve when increasing the batch size to 5000 or above. The following are the execution times for batch sizes 100, 1000, 5000 and 10000.

batch size 100
Iteration 0 - Elapsed: 10.418 seconds.
Iteration 1 - Elapsed: 10.09 seconds.
Iteration 2 - Elapsed: 10.385 seconds.
Iteration 3 - Elapsed: 9.806 seconds.
Iteration 4 - Elapsed: 9.979 seconds.
Avg: 10.1356 seconds.

batch size 1000
Iteration 0 - Elapsed: 6.99 seconds.
Iteration 1 - Elapsed: 6.41 seconds.
Iteration 2 - Elapsed: 6.654 seconds.
Iteration 3 - Elapsed: 6.845 seconds.
Iteration 4 - Elapsed: 6.736 seconds.
Avg: 6.726999999999999 seconds.

batch size 5000
Iteration 0 - Elapsed: 7.536 seconds.
Iteration 1 - Elapsed: 7.891 seconds.
Iteration 2 - Elapsed: 7.83 seconds.
Iteration 3 - Elapsed: 7.951 seconds.
Iteration 4 - Elapsed: 7.939 seconds.
Avg: 7.8294 seconds.

batch size 10000
Iteration 0 - Elapsed: 7.198 seconds.
Iteration 1 - Elapsed: 6.967 seconds.
Iteration 2 - Elapsed: 8.134 seconds.
Iteration 3 - Elapsed: 8.092 seconds.
Iteration 4 - Elapsed: 8.083 seconds.
Avg: 7.694799999999999 seconds.



 Comments   
Comment by Aziz Zitouni [ 12/Aug/20 ]

Hi Ross,

Thanks for the suggestions. I will make the necessary changes to the code and create a new topic on the community forums.

Regards,

Aziz

Comment by Ross Lawley [ 11/Aug/20 ]

Hi azitouni@magnitude.com thank you for reaching out.

As this sounds like a support issue, I wanted to give you some resources to get this questioned answered more quickly:

  • Our MongoDB support portal, located at support.mongodb.com
  • Our MongoDB community portal, located here
  • If you are an Atlas customer, there is free support offered 24/7 in the lower right hand corner of the UI.

Just in case you have already opened a support case and are not receiving sufficient help, please let me know and I can facilitate escalating your issue.

Just to note, your benchmark is also including time that it takes to create the batch operations, which you may wish to remove them from the timings. Also, there are limits to the size of bulk operation that can be sent to MongoDB but the driver will automatically handle splitting any batches, so larger documents may see multiple network round trips. You could also use the Command Montioring feature to give insight to timings as well as that can be used to show the network cost of the command.

Thank you!

Ross

Generated at Thu Feb 08 09:00:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.