[CXX-741] Performance Issue when parses BSONObj to C++ object Created: 20/Nov/15  Updated: 11/Sep/19  Resolved: 26/Nov/15

Status: Closed
Project: C++ Driver
Component/s: BSON
Affects Version/s: legacy-1.0.1
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Trinh The Thanh Assignee: Unassigned
Resolution: Done Votes: 0
Labels: legacy-cxx
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Open SUSE 13.1 64bit



 Description   

I try to implement c++ code that get about 800.000 document in a collection has 30 field (types include int, double, string), It take more than 240 seconds.

To count duration time, I seperate code into 2 sections. The first invokes DBClientConnection::query to get all from collection and return vector of BSONObjs, it take 20 seconds. The last iterate that vector and parses each of BSONObj to a C++ object, using BSONObj::getField().Int() (or Double(), or String()). And, more than 200 seconds!

Please consider and give me some advices to improve it.
Thanks you.



 Comments   
Comment by Trinh The Thanh [ 25/Nov/15 ]

Yes, Thanks you very much!

Comment by Andrew Morrow (Inactive) [ 24/Nov/15 ]

Yes, that is the right idea, and it looks like it paid off (removed ~75% of your runtime overhead). You could avoid a few other less expensive copies by changing the for loop in getAll to read 'for (const auto& obj : bsonObjs)'.

Overall, there are probably other unnecessary copies happening, but you should be able to find those yourself. If you need further runtime improvements, I recommend using a profiler like callgrind or vtune.

Comment by Trinh The Thanh [ 24/Nov/15 ]

Thanks you for support.
I apply two changes both iterate over BSON object fields and using emplace_back rather push_back. Result is count time been decrement, duration time of convert bsonObjs to entity is about 50 seconds.
Changed codes:

fromBSON

void Entity::fromBSONObj(const BSONObj& bson)
{
    for( BSONObj::iterator i = bson.begin(); i.more(); ) {
        BSONElement e = i.next();
        const char* fieldName = e.fieldName();
        if(strcmp(fieldName, "id") == 0) id.assign(e.Int());
        else if(strcmp(fieldName, "field_1") == 0) field_1 = e.Int();
        ........
   }
}

emplace_back getAll

int index = 0;
            for (std::pair<BSONObj, boost::posix_time::ptime> obj: bsonObjs) {
                entitisPtr->emplace_back();
 
                (*entitisPtr)[index].fromBSONObj(obj.first);
                (*entitisPtr)[index].setTimeStamp(obj.second);
 
                index ++;
            }

It's right what you mentions?

Comment by Andrew Morrow (Inactive) [ 23/Nov/15 ]

The issue in your Entity::fromBSONObj function is that BSONObj::operator[] is not constant time - it requires a linear scan and field name comparison of each field from the beginning of the object. As you are doing that repeatedly for each field you want to extract, the method is very slow. A better way to do this is to reverse the flow of control. Iterate over the BSON fields in the BSON object, and fill in the fields in your object as you find them. You are also, in general, making too many copies of things. At the very least you should be using emplace_back to move your newly populated entity into the vector in 'getAll'.

Comment by Trinh The Thanh [ 23/Nov/15 ]

Below table shows console output: (duration time in miliseconds)

ouput

time getBSONObjAll: 26533 //time to get BSONObj(s)
time bsonObjs = getBSONObjAll: 26533 // it mean time to assign result of functions is very small
time bsonObjs -> entity: 221539 // time to convert BSONObjs to entitys
get: 860910 tracks // collections count
time: 249132 // total time

Comment by Trinh The Thanh [ 23/Nov/15 ]

Source code as following:

entity.h

#ifndef TEWATRACKMONGOENTITY_H_
#define TEWATRACKMONGOENTITY_H_
 
#include <mongo/bson/bson.h>
 
using namespace mongo;
 
class Entity
{
public:
    Entity();
	BSONObj toBSONObj() const;
	void fromBSONObj(const BSONObj &bson);
    virtual ~Entity();
 
    void setTimeStamp(const boost::posix_time::ptime &t){
        this->mTimeStamp = t;
    }
 
    boost::posix_time::ptime getTimeStamp() const {
        return mTimeStamp;
    }
 
    void setCollectionName(const std::string &name){
        this->mCollectionName = name;
    }
 
    std::string getCollectionName() const {
        return mCollectionName;
    }
 
private:
    std::string mCollectionName;
    boost::posix_time::ptime mTimeStamp;
 
    int id;
    int field_1;
    int field_2;
    double field_3;
    double field_4;
    double field_5;
    double field_6;
    double field_7;
    double field_8;
    std::string field_9;
    double field_10;
    int field_11;
    int field_12;
    int field_13;
    int field_14;
    std::string field_15;
    int field_16;
    int field_17;
    std::string field_18;
    std::string field_19;
    std::string field_20;
    std::string field_21;
    std::string field_22;
    std::string field_23;
    double field_24;
    double field_25;
    int field_26;
    int field_27;
    int field_28;
    std::string field_29;
    std::string field_30;
    std::string field_31;
    std::string field_32;
    int field_33;
    std::string field_34;
    int field_35;
    int field_36;
    int field_37;
    int field_38;
    std::string field_39;
    std::string field_40;
    std::string field_41;
    int field_42;
    double field_43;
    int field_44;
    int field_45;
    int field_46;
    double field_47;
    std::string field_48;
    double field_49;
    int field_50;
    std::string field_51;
    std::string field_52;
    std::string field_53;
};
 
#endif /* TEWATRACKMONGOENTITY_H_ */

entity.cpp

#include <iostream>
 
#include "entity.h"
Entity::Entity()
{
    mCollectionName = "entity_replay";
}
 
BSONObj Entity::toBSONObj() const
{
    BSONObj bson;
    bson = BSON("id" << id
                << "field_1" << field_1
                << "field_2" << field_2
                << "field_3" << field_3
                << "field_4" << field_4
                << "field_5" << field_5
                << "field_6" << field_6
                << "field_7" << field_7
                << "field_8" << field_8
                << "field_9" << field_9
                << "field_10" << field_10
                << "field_11" << field_11
                << "field_12" << field_12
                << "field_13" << field_13
                << "field_14" << field_14
                << "field_15" << field_15
                << "field_16" << field_16
                << "field_17" << field_17
                << "field_18" << field_18
                << "field_19" << field_19
                << "field_20" << field_20
                << "field_21" << field_21
                << "field_22" << field_22
                << "field_23" << field_23
                << "field_24" << field_24
                << "field_25" << field_25
                << "field_26" << field_26
                << "field_27" << field_27
                << "field_28" << field_28
                << "field_29" << field_29
                << "field_30" << field_30
                << "field_31" << field_31
                << "field_32" << field_32
                << "field_33" << field_33
                << "field_34" << field_34
                << "field_35" << field_35
                << "field_36" << field_36
                << "field_37" << field_37
                << "field_38" << field_38
                << "field_39" << field_39
                << "field_40" << field_40
                << "field_41" << field_41
                << "field_42" << field_42
                << "field_43" << field_43
                << "field_44" << field_44
                << "field_45" << field_45
                << "field_46" << field_46
                << "field_47" << field_47
                << "field_48" << field_48
                << "field_49" << field_49
                << "field_50" << field_50
                << "field_51" << field_51
                << "field_52" << field_52
                << "field_53" << field_53);
    return bson;
}
 
 
 
 
void Entity::fromBSONObj(const BSONObj& bson)
{
    id = bson.getField("id").Int();
    field_1 = bson.getField("field_1").Int();
    field_2 = bson.getField("field_2").Int();
    field_3 = bson.getField("field_3").Double();
    field_4 = bson.getField("field_4").Double();
    field_5 = bson.getField("field_5").Double();
    field_6 = bson.getField("field_6").Double();
    field_7 = bson.getField("field_7").Double();
    field_8 = bson.getField("field_8").Double();
    field_9 = bson.getField("field_9").String();
    field_10 = bson.getField("field_10").Double();
    field_11 = bson.getField("field_11").Int();
    field_12 = bson.getField("field_12").Int();
    field_13 = bson.getField("field_13").Int();
    field_14 = bson.getField("field_14").Int();
    field_15 = bson.getField("field_15").String();
    field_16 = bson.getField("field_16").Int();
    field_17 = bson.getField("field_17").Int();
    field_18 = bson.getField("field_18").String();
    field_19 = bson.getField("field_19").String();
    field_20 = bson.getField("field_20").String();
    field_21 = bson.getField("field_21").String();
    field_22 = bson.getField("field_22").String();
    field_23 = bson.getField("field_23").String();
    field_24 = bson.getField("field_24").Double();
    field_25 = bson.getField("field_25").Double();
    field_26 = bson.getField("field_26").Int();
    field_27 = bson.getField("field_27").Int();
    field_28 = bson.getField("field_28").Int();
    field_29 = bson.getField("field_29").String();
    field_30 = bson.getField("field_30").String();
    field_31 = bson.getField("field_31").String();
    field_32 = bson.getField("field_32").String();
    field_33 = bson.getField("field_33").Int();
    field_34 = bson.getField("field_34").String();
    field_35 = bson.getField("field_35").Int();
    field_36 = bson.getField("field_36").Int();
    field_37 = bson.getField("field_37").Int();
    field_38 = bson.getField("field_38").Int();
    field_39 = bson.getField("field_39").String();
    field_40 = bson.getField("field_40").String();
    field_41 = bson.getField("field_41").String();
    field_42 = bson.getField("field_42").Int();
    field_43 = bson.getField("field_43").Double();
    field_44 = bson.getField("field_44").Int();
    field_45 = bson.getField("field_45").Int();
    field_46 = bson.getField("field_46").Int();
    field_47 = bson.getField("field_47").Double();
    field_48 = bson.getField("field_48").String();
    field_49 = bson.getField("field_49").Double();
    field_50 = bson.getField("field_50").Int();
    field_51 = bson.getField("field_51").String();
    field_52 = bson.getField("field_52").String();
    field_53 = bson.getField("field_53").String();
}
 
Entity::~Entity()
{
}

mongodbaccess.h getAll function

template <class T>
    std::unique_ptr<std::vector<T>> getAll(const std::string &collection,
                                           bool &isOk,
                                           std::string &errMsg) {
        std::unique_ptr<std::vector<T>> entitisPtr(new std::vector<T>());
        auto t0 = std::chrono::high_resolution_clock::now();
        auto bsonObjs = getBSONObjAll(collection, isOk, errMsg);
        auto t1 = std::chrono::high_resolution_clock::now();
        std::chrono::duration<float> fs = t1 - t0;
        std::chrono::milliseconds d = std::chrono::duration_cast<std::chrono::milliseconds>(fs);
        std::cout << "time bsonObjs = getBSONObjAll: " << d.count() << std::endl;
        if(isOk){
            T entity;
            for (std::pair<BSONObj, boost::posix_time::ptime> obj: bsonObjs) {
                entity.fromBSONObj(obj.first);
                entity.setTimeStamp(obj.second);
                entitisPtr->push_back(entity);
            }
        }
        auto t2 = std::chrono::high_resolution_clock::now();
        std::chrono::duration<float> fs1 = t2 - t1;
        std::chrono::milliseconds d1 = std::chrono::duration_cast<std::chrono::milliseconds>(fs1);
        std::cout << "time bsonObjs -> entity: " << d1.count() << std::endl;
 
        return std::move(entitisPtr);
    }

mongodbaccess.cpp getBSONObjAll function

std::vector<std::pair<BSONObj, boost::posix_time::ptime> > MongoDbAccess::getBSONObjAll(const std::string &collection, bool &isOk, std::string &errMsg)
{
    std::vector<std::pair<BSONObj, boost::posix_time::ptime> > bsonobjs;
    MongoConnection connection = MongoConnectionManager::getInstance()->getConnection(isOk, errMsg);
    if(isOk){
        try{
            auto t0 = std::chrono::high_resolution_clock::now();
            std::auto_ptr<DBClientCursor> cursor = connection->query(Utils::getNamespace(MongoConnectionManager::getInstance()->getDbInfo().db_name, collection));
 
            int count = 0;
            while ( cursor->more() ) {
                count++;
                BSONObj bsonObj = cursor->next();
                BSONObj dataObj = BSONObj(bsonObj[MongoDbAccess::dataField].value());
                boost::posix_time::ptime currentTime = Utils::convertString2Time(bsonObj[MongoDbAccess::timeStampField].String());
                bsonobjs.push_back(std::make_pair(dataObj.copy(), currentTime));
            }
 
            auto t1 = std::chrono::high_resolution_clock::now();
            std::chrono::duration<float> fs = t1 - t0;
            std::chrono::milliseconds d = std::chrono::duration_cast<std::chrono::milliseconds>(fs);
            std::cout << "time getBSONObjAll: " << d.count() << std::endl;
 
            if(count == 0){
                isOk = false;
                errMsg = "no such object to get.";
            } else {
                isOk = true;
            }
        }
        catch(DBException &ex){
            isOk = false;
            errMsg = "getbytimerange error: ";
            errMsg.append(ex.what());
        }
    }
    return bsonobjs;
}

main.cpp call getAll function

std::vector<Entity> mg_get_all() {
    bool isOk = false;
    std::string errMsg;
    auto t0 = Time::now();
    Entity tmp;
    auto rs = MongoDbAccess::getInstance()->getAll<Entity>(tmp.getCollectionName(), isOk, errMsg);
    auto t1 = Time::now();
    fsec fs = t1 - t0;
    ms d = std::chrono::duration_cast<ms>(fs);
    if(isOk) {
        std::cout << "get: " << rs->size() << " tracks" << std::endl;
        std::cout << "time: " << d.count() << std::endl;
    } else {
        std::cout << "get error: " << errMsg << std::endl;
    }
 
    return *rs;
}

Comment by Trinh The Thanh [ 23/Nov/15 ]

Scons build script: scons --c++11=on --sharedclient=SHAREDCLIENT

Comment by Andrew Morrow (Inactive) [ 20/Nov/15 ]

Could you also provide the the SCons invocation you used to build the driver?

Comment by Andrew Morrow (Inactive) [ 20/Nov/15 ]

Could you provide the code and show how you are building the vector, and then how you are parsing each document?

Generated at Wed Feb 07 22:00:09 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.