[CXX-946] mongoclient.dll assertion exception Created: 21/Jun/16  Updated: 14/Sep/16  Resolved: 14/Sep/16

Status: Closed
Project: C++ Driver
Component/s: API
Affects Version/s: legacy-1.1.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Mariano Botta Assignee: J Rassi
Resolution: Done Votes: 0
Labels: legacy-cxx
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File mongo.log    

 Description   

Hi all,
I’m having problems using mongoDB with high load.
The database has about 7200 collections, storing documents in different interval of time; every 1 sec in some cases, every 1 min in other. Each document is composed as
_id: timestamp
v1: double
v2: int
Each iteration, the application sends all the information to the database; there is no incomplete documents.
When I start using the application and requesting data, in several times, Mongo returns AssertionException, with code 0 and the following message
assertion src\mongo\client\dbclientcursor.cpp:228
According what I found, this message means that the cursor is not longer valid, it’s timed out. I configure the timeout when connecting with mongo server. At the beginning, the timeout was 60 secs. I increased it to 90 secs but it stills returns this error. I even tried to disabled the timeout.
Or
assertion src\mongo\util\net\message_port.cpp:278

Each request is handled by different threads
With low load, even returning these errors, everything keeps running. With high load, randomly, my server stops working on a request for data to some collection (not always the same).
These errors started when we upgraded to mongo 3; I tested mongo 3.0.7, 3.2.0, 3.2.7 and 3.3.8, and the behavior was the same in all cases.
In the Windows Event viewer, the error is
Faulting application name: app.exe, version: 0.0.0.0, time stamp: 0x5758321a
Faulting module name: mongoclient.dll, version: 0.0.0.0, time stamp: 0x5761d4aa
Exception code: 0xc0000005
Fault offset: 0x00000000000e1f0d
Faulting process id: 0x44d4
Faulting application start time: 0x01d1cbfea9b06353
Faulting application path: C:\Program Files\MyApp\App\app.exe
Faulting module path: C:\Program Files\MyApp\App\mongoclient.dll
Report Id: 2d0ad6f9-37f4-11e6-babf-0aa31160729b

Environment:
Windows Server 2008 R2 64 bits.
RAM: 15 GB
Mongo C++ legacy driver 1.1.1, compiled with Visual Studio 2010 and boost 1.60.0.
MongoDB running at localhost, with no replica

Can anyone help me?



 Comments   
Comment by J Rassi [ 27/Jun/16 ]

Yep, connection objects in the C++ driver are not thread-safe. It seems like the issue that you are running into is that application threads are accessing 'cli_' without proper synchronization in the function that is issuing queries against the server. This is consistent with the observation you report that the issue goes away when you change your application to not use threads.

You can fix this issue by locking the client object before using it in your querying code. Note also that if multiple application threads need to communicate with the database at the same time, you will likely see much better performance by giving each application thread its own connection to the database, or by implementing your own connection pooling functionality. Unfortunately, the legacy C++ driver does not offer a native connection pool API. However, if you upgrade to the new C++11 driver, you will be able to use the mongocxx:pool class, which provides a thread-safe API to manage multiple connections to the same MongoDB deployment.

One additional note: do consider using boost::unique_lock or boost::lock_guard for locking mutexes in your code, instead of invoking boost::mutex::lock() directly. This will help make your code exception-safe (in its current state, '_mutex' will remain locked if any exceptions thrown between the call to '_mutex.lock()' and '_mutex.unlock()' are unhandled in the insert logic).

I'm resolving this ticket as "Works as Designed". Feel free to post the mongodb-user mailing list for any design questions about using the legacy C++ driver in multi-threaded applications.

Best,
~ Rassi

Comment by Mariano Botta [ 23/Jun/16 ]

Hi Rassi,

Yes, the threads are sharing the same client connection; actually, there
is only one connection with the mongo server and all the calls to the
server goes through it; the call to insert values is inside a boost
mutex lock/unlock (see below).

The server is standalone, at localhost. There is no replica or cluster.

I had a previous version of the application in a production server,
using MongoDB 2.6.5. Right now it’s using MongoDB 3.2.0 and it’s working
fine. This application version is not using threads; we introduce it
before, because operations were taking too much time.

Just for testing, how should I handle mongo connections in this context?
Should I use a connection pool? How can I use it here? Or is there
another solution I’m missing?

This is the code to connect to mongo:

stringhost;
 
staticstringSTR_UNDEFINED_PATH = "UNDEFINED_PATH";
 
if(parseConnectionString(conn_str, host)!=0)
 
{
 
throw(exception("can't find host specification in connection string"));
 
}
 
_connStr =conn_str;
 
_host =host;
 
_infoStr =info;
 
clt_->setSoTimeout(so_timeout_);
 
clt_->connect(host, info);
 
return0;
 
Connection string is just localhost and info is empty.
 
This is a code portion I’m using when inserting data:
 
staticboost::mutex_mutex;
 
_mutex.lock();
 
mongo::BSONObjBuilderb;
 
try{
 
std::stringcollection = getCollectionName(id);
 
marshalEvent(id, e, b); // returns the BSONObject with the values to insert.
 
mongo::BSONObjobj = b.obj();
 
if(!obj.isEmpty()) {
 
clt_->insert(collection, obj);
 
}
 
} catch(boost::thread_interrupted&) {
 
LOGGER->Log(L_ERROR, "Insert interrupted", SIGNAT);
 
} catch(mongo::AssertionException&e) {
 
stringstreamss;
 
ss <<"Mongo AssertionException: Code: "<<e.getCode() <<" -"<<e.what();
 
LOGGER->Log(L_WARN, ss.str().c_str(), SIGNAT);
 
Sleep(5000);
 
if(!(clt_->isStillConnected())) {
 
LOGGER->Log(L_INFO, "Connection lost. Reconnecting...", SIGNAT);
 
std::stringinfo;
 
clt_->connect( (std::string)_host, info);
 
}
 
} catch(mongo::OperationException&e) {
 
//stringstream ss;
 
//ss << "Mongo OPException: Code: " << e.getCode() << "-" << e.what();
 
//LOGGER->Log(L_WARN, ss.str().c_str(), SIGNAT);
 
} catch(mongo::DBException&e) {
 
stringstreamss;
 
ss <<"Mongo DBException: Code: "<<e.getCode() <<"-"<<e.what();
 
LOGGER->Log(L_WARN, ss.str().c_str(), SIGNAT);
 
} catch(std::exception&e) {
 
stringstreamss;
 
ss <<e.what();
 
LOGGER->Log(L_ERROR, ss.str().c_str(), SIGNAT);
 
//throw(exception(s + " at " + SIGNAT));
 
} catch(...) {
 
std::strings("Unknown Exception");
 
LOGGER->Log(L_ERROR, s.c_str(), SIGNAT);
 
//throw(exception(s + " at " + SIGNAT));
 
}
 
_mutex.unlock();
 
And this is one of the reading functions:
 
Arguments:
 
·St: timestamp
 
·Nvalues: number of values requested.
 
·Forward: values previous from or next to st timestamp.
 
·Ve: return list
 
try{
 
ve.clear();
 
// collection where the values are stored.
 
std::stringcollection = 
getCollectionName(id);std::auto_ptr<mongo::DBClientCursor> cursor;
 
mongo::Queryqry;
 
if(forward){
 
qry =MONGO_QUERY("_id"<< BSON("$gte"<< ToMongoDate(st)));
 
cursor =clt_->query(collection, qry.sort("_id", 1), (int)nvalues);
 
}
 
else{
 
qry =MONGO_QUERY("_id"<< BSON("$lte"<< ToMongoDate(st)));
 
cursor =clt_->query(collection, qry.sort("_id", -1), (int)nvalues);
 
}
 
if(!cursor.get()) {
 
stringstreamss;
 
ss <<"Query failure >> Collection: "<<collection <<" query: 
"<<qry.toString() <<" forward: "<<forward;
 
LOGGER->Log(L_INFO, ss.str().c_str(), SIGNAT);
 
return::CODE_EVENT_NOT_FOUND; // CODE_OK;
 
}
 
SEvente;
 
while(cursor->more()) {
 
unmarshalEvent(cursor->next(), id, e); // extracts the values
 
ve.push_back(e);// adds the values to the list.
 
}
 
return::CODE_OK;
 
} catch(mongo::UserException&e) {
 
std::strings(e.what());
 
LOGGER->Log(L_ERROR, s.c_str(), SIGNAT);
 
return::CODE_FAIL;
 
}
 
catch(conststd::exception&e) {
 
std::strings(e.what());
 
LOGGER->Log(L_ERROR, s.c_str(), SIGNAT);
 
return::CODE_FAIL;
 
} catch(...) {
 
std::strings("Unhandled exception");
 
LOGGER->Log(L_ERROR, s.c_str(), SIGNAT);
 
return::CODE_FAIL;
 
}

Attached you can find mongo.log file; I can’t find any reference to
these errors.

Thanks and regards,

Mariano

Comment by J Rassi [ 23/Jun/16 ]

Hi mariano.botta@soteicavisualmesa.com,

Both assertion failures you are observing are indicative of the client processing invalid messages from the server, and are unrelated to cursor timeout issues. In the first failure (dbclientcursor.cpp:228), the server is receiving an OP_GET_MORE message where the "cursor not found" result flag is set and the "cursor" field is non-zero (it's invalid for the "cursor" field to be non-zero when this flag is set). In the second failure (message_port.cpp:278), the "responseTo" header field of the message response is not equal to the "requestID" header of the corresponding message sent (they are expected to be the same).

Based on the information I have available at this point (multithreaded application encountering invalid messages from the server under high load), I suspect a possible concurrency error in your application as the root cause, where one thread is reading data off of a socket owned by another thread. Otherwise, it's certainly possible that the issue is caused by a bug in the client driver, a bug in the server, or corruption of wire messages over the network.

I'll need additional information to further diagnose this issue. Could you please provide answers to the following questions:

  • Are the your application worker threads sharing the same client connection (DBClientBase or one of its derived classes DBClientConnection/DBClientReplicaSet/etc) objects or DBClientCursor objects without synchronization?
  • Could you provide the full application log file covering a period of time where this issue occurs? I'm expecting to see a particular log entry with the text "MessagingPort::call wrong id", which will reveal extra information about the invalid message headers received.
  • Are you connecting to a standalone server, or to a replica set? What is your cluster connection string?
  • Does the issue go away when you downgrade to the MongoDB server version before 3.0.0 that you were previously on? Which version is this?
  • Would you be able to provide the full source code for your application, or code excerpts for the logic where your application reads data from the server?

Let me know if you'd like any additional clarification on the above questions.

Thanks,
~ Rassi

Generated at Wed Feb 07 22:00:52 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.