[CXX-907] how to get object size when constructing update_one Created: 06/May/16  Updated: 11/Sep/19  Resolved: 09/May/16

Status: Closed
Project: C++ Driver
Component/s: API
Affects Version/s: 3.0.1
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Judy Han [X] Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Linux



 Description   

I am starting to migrate from legacy driver 1.0.6 to new c++11 driver 3.0.1.
In legacy driver, when we do bulk write, we were able to keep statistic information for the object to be updated. e.g:

loadCountInBytes_ += record.objsize();
bulk.find(BSON("EventId" << eventIDStr)).upsert().replaceOne(record);

where variable "record" is of type "mongo::BSONObj" (in our use case, it's an insert 99% of the time)
In new c++11 driver, what is the recommended way to track the above object size statistics (bytes loaded) since we are using the builders now (a filter builder and an update builder).
Thanks!
Judy



 Comments   
Comment by Judy Han [X] [ 25/May/16 ]

Hi David,
Thanks. Here are my answers:

  • Yes, "EventId" is unique. However, it is possible that we may have to reload the same object in certain scenarios. e.g. if a bulk_load fail for any reason, we may have reload the bulk where some of the objects are already loaded into mongodb. And "yes" that is the source of the duplicate key errors.
    Sorry, we recently made some changes so "EventId" field is eliminated and we are using "_id" for storing "EventId". But the answer above still holds true.
  • Originally, we are relying on the server to do it, but now we are using "_id" as "EventId". Regarding if I care about the size of "_id", I would like to get your opinion on whether I should care? And in which case we should care and how much performance impact it is? Currently "_id" is a string with variable length around 15 to 30 bytes or so.
  • I meant the operation is expected to be idempotent and won't result in duplicate key errors if repeated. Thanks for the clarification.
    I am using mongod version 3.2.5, yes I am using wiredTiger as a storage engine with compression. This size is just for stats collection purpose so we can see how much compression we are achieving. It would be great if mongodb has these information already.

Thanks!
Judy

Comment by David Golden [ 24/May/16 ]

It would help if you could describe your data model a bit better so that I can make a more helpful suggestion.

Particularly:

  • Is the "EventID" field expected to be unique? Is that the source of the duplicate key errors
  • Are you generating _id yourself client-side or relying on the server to do it? Do you care about the size of _id in your record of object sizes?
  • When you say "allow duplicates", are you actually desiring duplicate documents for a given EventID? Or do you mean the operation is idempotent and won't result in duplicate key errors if repeated? Or something else?

Also, depending on what version of MongoDB you're using, particularly if you're using WiredTiger as a storage engine with compression, the size of the document in the database isn't the same as the uncompressed BSON document size anyway. Does that matter for your application?

Comment by Judy Han [X] [ 24/May/16 ]

Hi David,
I tried both replace_one and update_one, the update_one allow duplicates which is what I want, but replace_one will error out for duplicates, so it seems I can not really use replace_one. Any suggestions?
Thanks!
Judy

Comment by David Golden [ 09/May/16 ]

replace_one will use slightly fewer bytes on the wire, since it doesn't have to wrap the document in a $set document.

Comment by Judy Han [X] [ 09/May/16 ]

HI David,
Thanks for the suggestion! I will try that. I can use either update_one or replace_one. Looks like both of them are supported with bulk_write. Is there any performance difference one way or the other?
Thanks!
Judy

Comment by David Golden [ 09/May/16 ]

Hi, Judy. Do you need to use update_one? (Is there a possibility there is an existing document whose fields you want to preserve?) Otherwise, instead of using $set with update_one, you could use replace_one (still with upsert) in which case you would know the length of the replacement document.

Comment by Judy Han [X] [ 09/May/16 ]

Hi David,
Thank you very much for the suggestion. In my case I am doing an update, I have:

    bsoncxx::builder::stream::document filterBuilder{};
    bsoncxx::builder::stream::document eventUpdateBuilder{};
    using bsoncxx::builder::stream::open_document;
    using bsoncxx::builder::stream::close_document;
 
    filterBuilder << "EventId" << "12345678";
    eventUpdateBuilder << "$set" << open_document << "EventId" << "12345678" << "Content" << "Hello World!" << close_document;
 
    std::cout << eventUpdateBuilder.view().length() << std::endl;
    mongocxx::options::update opt{};
    opt.upsert(true);
    collection.update_one(filterBuilder.view(), eventUpdateBuilder.view(), opt);

I can print out "eventUpdateBuilder.view().length()", but that is not really the size of the end object, isn't it? So in this case how do I get the size of the object?
Thanks a lot!
Judy

Comment by David Golden [ 06/May/16 ]

Hi, Judy.

You can get the length from the length method on a document view:

#include <iostream>
#include <mongocxx/instance.hpp>
 
#include <bsoncxx/builder/stream/document.hpp>
#include <bsoncxx/json.hpp>
 
int main(int, char**) {
    mongocxx::instance inst{};
 
    bsoncxx::builder::stream::document document{};
 
    document << "hello" << "world";
 
    std::cout << document.view().length() << std::endl;
 
    std::cout << bsoncxx::to_json(document.view()) << std::endl;
 
}

Output:

22
{
    "hello" : "world"
}

Generated at Wed Feb 07 22:00:45 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.