[CXX-965] C++11 driver 3.0.1 so much slower than legacy C++ driver 2.6.11? Created: 05/Jul/16  Updated: 08/Jan/24  Resolved: 13/Jul/16

Status: Closed
Project: C++ Driver
Component/s: Performance
Affects Version/s: 3.0.1
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: David Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Screen Shot 2016-07-05 at 9.41.20 PM.png     PNG File Screen Shot 2016-07-05 at 9.47.54 PM.png     File main.cpp     File main2.cpp    
Issue Links:
Duplicate
duplicates CXX-344 Asynchronous (event driven) Driver Closed

 Description   

C+11 driver 3.0.1 slower than legacy C+ driver 2.6.11, we tried to write 2 millions of simple document to MongoDB, with C++ driver 2.6.11 it costs 41 seconds, and 3.0.1 takes more than 189 seconds.



 Comments   
Comment by David [ 06/Jul/16 ]

OK, I understood, it's synchronized method. I'll try to use insert_many() instead.

Comment by David Golden [ 06/Jul/16 ]

The 2.6.11 driver doesn't ask the DB for an acknowledgement of the write. Think of it like UDP. You send the packet and hope the other side gets it. The default with drivers now is to acknowledge writes, which means in addition to the actual time to send the document over the wire, the driver waits for the server to send an "ok" response (and has to deserialize and check it). That confirmation adds safety but also adds overhead, which is why batching with insert_many helps so much.

I'm not sure what you mean by "save objects to database immediately" – whether you mean that every line must be individually inserted before processing the next line or whether you mean there's just too much data to create documents for the entire file and only insert_many at the end, or something else entirely.

One technique that some people in your situation use is to batch at least some of the insertions during processing. I.e. queue up a few hundred or a thousand documents, and then only issue an insert_many periodically through the loop (or file). If you batched every 1000 (assuming they all fit into a single write of < 16MB), then rather than 2 million database roundtrips inserting line by line, you'd have 2 thousand roundtrips.

Comment by David [ 06/Jul/16 ]

Here is the sample codes with created a index for 'age'.
Actually, we are trying to parsing large XML files and store each marks as an object and build relationships of them, about 2 million of objects/relationships created while XML being parsed line by line with a XML Reader. So we cannot use insert_many() but insert_one() due to we need save objects to database immediately when it was created. But with C++ Driver 2.6.11 the insert() method faster (41s), does it run asynchronously? or act as insert_many() that 3.0.1 does?

Comment by David Golden [ 05/Jul/16 ]

Hello. Thanks for contacting us. Without seeing your sample code for the two cases, we can't give you much guidance beyond what we see in the screen shots.

I suspect what you're seeing is the change to having acknowledged writes be the default insert behavior for all drivers. There are two articles discussing the evolution from the 2.6 driver to the "legacy" driver to the new C++11 driver that you might wish to review:

With modern drivers, we recommend the use of insert_many when "bulk loading" documents rather than repeated calls to insert_one to avoid a database round-trip for every document. If you retry your test with insert_many, you should see a significant improvement.

Generated at Wed Feb 07 22:00:55 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.