[SERVER-6326] Batch inserts in C++ Created: 06/Jul/12  Updated: 24/Feb/17  Resolved: 24/Feb/17

Status: Closed
Project: Core Server
Component/s: Internal Client
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Chad Tindel Assignee: Unassigned
Resolution: Incomplete Votes: 0
Labels: cxxcopy
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

This is a question from one of our customers. Sorry for the late notice, we have a meeting with them at 2pm today.

I had a question (possibly bug report) about the MongoDB C++ driver. We wanted to switch from using:

insert(const string &ns, BSONObj obj, int flags)

to the "batched" version:

insert(const string &ns, const vector<BSONObj> &v, int flags)

for performance reasons.

However, we hit an issue pretty quickly. Under the covers, the version which takes the vector uses a BufBuilder to build up the request to send. In versions 2.0.x of the API, the growth algorithm for the buffer inside BufBuilder means it does not grow in a particularly predictable fashion and we found ourselves exceeding the buffer's 64MB limit pretty consistently. Consequently, we switched to using BufBuilder directly ourselves and effectively reverse engineering the growth algorithm to make sure we stuffed as much into it as possible.

I see the growth algorithm has been changed in 2.1.0 to ensure the buffer's size grows in powers of two which means I can remove the reverse engineering from our code.

However, there seems to be a loophole. The BufBuilder constructor takes an int argument which is the buffer's initial size and there is no maximum size validation on this. Therefore, I could create it to be larger than 64MB and as long as my appends never cause it to grow, it won't throw any errors about the size.

Is this loophole intentional or is this a bug?
What are the potential downsides to us exploiting it?



 Comments   
Comment by Eric Milkie [ 24/Feb/17 ]

Currently, the legacy C++ driver is currently only getting critical bug/security fixes, not new features.
We encourage users to migrate to the new 'mongocxx' driver for much greater stability, configurability and support for new MongoDB server features.
See the C++ driver docs site for details on mongocxx.

Comment by Greg Studer [ 09/Jul/12 ]

Another alternative would be to recompile mongodb with a different max size constant for the BufBuilder.

Comment by Greg Studer [ 09/Jul/12 ]

It's not intentional. Downsides to exploiting it are basically the downsides to exploiting unintentional behavior anywhere - it's very much not supported now or in the future. From a cursory look, it seems like it would work ok, but again, there could be dragons.

Is there a reason to suspect that batching >64MB will give a significant performance boost? Extremely small documents?

Generated at Thu Feb 08 03:11:22 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.