[CXX-586] bulk API input parameter does not have "fsync" or "journal", is fsync=false, journal=false always used? Created: 23/Apr/15  Updated: 11/Sep/19  Resolved: 24/Apr/15

Status: Closed
Project: C++ Driver
Component/s: API
Affects Version/s: legacy-1.0.0
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Judy Han [X] Assignee: Unassigned
Resolution: Done Votes: 0
Labels: legacy-cxx
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

Looks there are 2 ways to do the bulk insert:
1. the older way, i.e. insert a vector via mongo::DBClientConnection insert() API.
2. the newer way, i.e. use mongo::BulkOperationBuilder insert() and execute() API.

To check errors for #1, we call getLastError(const std::string &db, bool fsync=false, bool journal =false, int writeconcern=0, int wtimeout=0)

To check errors for #2, call mongo::BulkOperationBuilder execute (const WriteConcern *writeConcern, WriteResult *writeResult), second parameter will provide the error code.

For #2, the API allow us to specify writeconcern, but not fsync or journal.
I have following questions:

1. is #2 the recommended way to do bulk operations?
2. If so, is there a way to specify fsync and journal when we use the #2 .
3. If we do not use bulk API, i.e. we insert 1 document at a time what is the recommended way to check errors?
4. Is that true that getLastError() is going to be deprecated?

Thanks!
Judy



 Comments   
Comment by Judy Han [X] [ 24/Apr/15 ]

Got it. Thanks a lot for the explanation! Feel free to close the ticket.
Judy

Comment by Adam Midvidy [ 24/Apr/15 ]

Judy regarding your latest question - if requiresConfirmation true, then the driver will do the equivalent of calling getLastError for you when you execute the write. If requiresConfirmation() is false, then you need to call getLastError() yourself to check if the write succeeded.

Comment by Adam Midvidy [ 24/Apr/15 ]

For #1, when writing to a server with journaling enabled the effect of fsync(true) and journal(true) are exactly the same - the write will block until it has been written to the journal. fsync(true) ONLY has a different effect when journaling is NOT enabled on the server - then the write will block until it is fsync'd to disk. Note that when a write is written to the journal it can not be lost, if the server crashes, it will be able to reconstruct the write from the journal.

For additional questions you have about MongoDB (not the C++ driver) I would reach out on the mongodb-user email list, as people there will be better equipped to answer your questions.

Comment by Judy Han [X] [ 24/Apr/15 ]

Hi Adam,
For #2, thanks for the suggestion, I tried some tests, looks I have to use bulk.find().upsert().replaceOne() for my purpose, if I use bulk.find().replaceOne() directy it will only do the update, but not insert if the document does not alrady exist. thanks.

For #1, when you say "the effect is the same", do you mean set journal(true) or fsync(true) has the same effect? I thought journal(true) does not guarantee write to disk. I assume if fsync(true) means journal is automatically true? did I understand it correctly?

Regarding using WriteConern class, I have a follow up question on:

bool mongo::WriteConcern::requiresConfirmation() 	const
    Whether we need to send getLastError for this WriteConcern.
    The only time we don't require confirmation is when w is explicitly set to 0.

Does that mean I still need to call getLastError() unless I explicitly set _w to 0 ? (I assume it's "_w" since there is no "w"in this class) Please advise. Thanks!

Comment by Adam Midvidy [ 24/Apr/15 ]

Hi Judy,

First off, in regards to my previous answer, note that journal(true) and fsync(true) should not be used together. I would only set journal(true) - assuming you are running with journaling enabled, the effect is the same.

For your next question:
I would recommend using "bulk upserts" for your use-case. These are special updates that become inserts if the update is not matched.

example:

BulkOperationBuilder bulk(&conn, "mydb.mycollection", false);
 
 
bulk.find(BSON("_id" << 1 << "myKeyWithUniqueIndex" << 2))
       .replaceOne(BSON("_id" << 1 << "myKeyWithUniqueIndex" << 2 << otherKey << 3));
WriteResult res;
bulk.execute(&WriteConcern::journaled, &res);

Any keys that are uniquely indexed (i.e. _id) should go in the selector passed to bulk.find(). Then you can pass the full document to replaceOne. If there is no document that matches the selector, the document will be inserted. If not, the document will be updated with the replacement, which avoids the "duplicate key" error.

You can read more about upserts here: http://docs.mongodb.org/manual/reference/method/Bulk.find.upsert/

Comment by Judy Han [X] [ 24/Apr/15 ]

Hi Adam,
Thank you very much for the information!
I did not realize WriteConcern object has the ability to set fsync and journal parameters. Thanks!
Another question:
Is there a way for bulk insert to ignore "duplicate key" error? The reason I need the non-bulk insert is for bulk insert failure case where I end up redo the insert one at a time just to figure out which one is a true failure and which one is from duplicate loading (may happen in our application but we consider it a success).
Thanks!
Judy

Comment by Adam Midvidy [ 24/Apr/15 ]

Hi Judy,

(1) BulkOperationBuilder is definitely the recommended way to do bulk operations.
(2) You can specify fsync and journal by calling fsync(true) and journal(true) on the WriteConcern object you pass to BulkOperationBuilder::execute.
(3) I would recommend using the Bulk API, even if you are only inserting 1 document. Then you should not even have to use getLastError().
(4) I think getLastError() will continue to work for the foreseeable future but its use is discouraged. I would recommend using the Bulk API and passing the appropriate WriteConcern even when inserting small numbers of documents.

Please let me know if you have any more questions.

Generated at Wed Feb 07 21:59:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.