[SERVER-7041] DBClientCursor::init call() failed Created: 14/Sep/12  Updated: 03/Jul/13  Resolved: 01/Oct/12

Status: Closed
Project: Core Server
Component/s: Internal Client
Affects Version/s: 2.2.0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Goran Nikolovski Assignee: Gregor Macadam
Resolution: Done Votes: 1
Labels: crash, driver
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

O.S: Ubuntu 11.04,
mongodb version: Mongodb v2.2.0,
Hardware: Intel(R) Core(TM) i7 CPU 64bit, 8GB RAM


Attachments: File MongoSample.tar.gz    
Operating System: Linux
Participants:

 Description   

I have developed server application in C++ using the v2.2.0 c++ driver for establishing connections to MongoDb.
While the server is receiving the client requests, it performs MongoDb operations (connect, query, insert...). What happens is if i kill the Mongodb server (Ctrl + C) while the application server is performing MongoDb operations, in some cases i catch "socket exception [CONNECT_ERROR] for localhost:27017", but in many cases i can not catch any exception but the server application shuts down with the message: "DBClientCursor::init call() failed".
What is the meaning of this message, can i catch it and process it somehow, or it is a bug in the MongoDb driver?



 Comments   
Comment by Ian Whalen (Inactive) [ 09/Oct/12 ]

dbabits, please open a new SERVER ticket with your issue so that we can track it (and tie it back to this ticket if appropriate.)

Comment by david babits [ 09/Oct/12 ]

I am getting the same problem while using official mongo client:
Tue Oct 9 13:13:14 DBClientCursor::init call() failed
Tue Oct 9 13:13:14 Error: Error during mongo startup. :: caused by :: 10276 DBClientBase::findN: transport error: xxx.xxx.xxx.xxx:28017 ns: admin.$cmd query:

{ whatsmyuri: 1 }

src/mongo/shell/mongo.js:93

/home/hprspkg/hprs_3rd_party/mongodb-linux-x86_64-static-legacy-2.2.0/bin/mongo localhost:28017/mydb --quiet --eval 'db.mytable.find({id:{$in:["jq_2395764120","jq_2395764121"]}}.forEach(printjson);

server was shutdown prior with CTRL-C

Comment by Gregor Macadam [ 01/Oct/12 ]

Resolving as "works as designed" - feel free to reopen if you don't feel this is the case.

Comment by Randolph Tan [ 22/Sep/12 ]

I am saying that the MongoConnection shouldn't have a ScopedDbConnection as a member variable (unless you make your class non-copyable, in fact, we should have made ScopedDbConnection non copyable). I'll show you why the current code is dangerous:

MongoConnection conn;

// since you are not passing by reference here, a copy is created, at this point there are 2 different objects that has a pointer to the ScopedDbConnection,
// hence it is being shared to more than two objects.
inserting(conn);

Note that the copy has a scope that is only within the inserting function. So when the function terminates, it will call the MongoConnection destructor, which will in turn attempt to destruct the ScopedDbConnection (which you now placed a flag to hack around this).

Comment by Aleksandar [ 22/Sep/12 ]

At our project each of the server's clients (i.e. threads) get their own instance of MongoConnection. There is no double free, the problem occurs when I try to call done() and delete for specific ScopedDBConnection. If in meantime the connection to the primary node failed (primary node goes down), looks like the memory where the ScopedDBConnection pointer points gets corrupted. If you try to call done() and delete on that same pointer, the result is SIGABRT. The check that we perform for NULL in the destructor of MongoConnection was just a quick guess that we do double free. However, this did not solve the problem, adding some sort of flag that the connection to the primary node failed, did solve the problem but with possible memory leaks.

Comment by Randolph Tan [ 21/Sep/12 ]

I took a glance at your code and noticed that you are storing the ScopedDBConnection to the MongoConnection class. This is ok as long as you don't share the connection and don't share the pointer around (This is probably what caused the crash. ScopedDBConnection is not meant to be copied and doing a shallow copy of ScopedDBConnection can cause double free).

Comment by Aleksandar [ 21/Sep/12 ]

Here is a link to a simple project that we just wrote, http://dl.dropbox.com/u/15196029/MongoSample.tar.gz . It performs basically the same operations to mongodb as in our project and we managed to find the cause of the problem.
When the primary node goes down and DBException is thrown, the ScopedDbConnection that is being used probably gets corrupted and you can no longer call done() or its destructor. Calling it results in SIGABRT, the workaround is that we set a bool (withExc) when exception is thrown in the code, that indicates that the scoped connection failed. In this case done() or its destructor will not be called. However I'm not sure how this affects the memory i.e. if there are any leaks. My question is, is there any other way to avoid this problem other than this workaround?

Comment by Gregor Macadam [ 20/Sep/12 ]

Is this easily reproducible for you? I've tried to reproduce it with a small program using DBClientReplicaSet to connect to the replica set. I have three nodes with priorities 3,2,1.
Just to check I am doing the same thing as you here is my sequence.
1. I have the priority 2 and 1 node running, 2 is primary.
2. I bring up priority 3 node it becomes secondary.
3. I start my server and it connects to the replica set and begins to query in a loop.
4. Priority 2 node steps down and server catches DBException's while trying to access cursor.
5. Priority 3 node becomes primary and server can query the cursor again.

I don't get a SIGABRT however. Is there some code you could attach that can reproduce this?

Comment by Aleksandar [ 19/Sep/12 ]

Handling the NULL cursor case seems that it has resolved the issue without replica set.
However, in the case of replica set with three nodes where nodes have priorities set to 3,2,1 the result is the same. Tests have shown, when all nodes are up and our server app is under load if the primary node goes down, immediately there is a message "DBClientCursor::init call() failed" followed by an abort signal. Same happens when the node with priority = 3 comes back on, it tries to become primary again, thus the current primary becomes secondary but our server app is performing operations to that node, immediately the result in our application is the same.
In the other case where the scenario follows these steps:
1. our server application is down and clients try to reach to it
2. the node with highest priority is down , the second highest priority node is primary
3. the node with highest priority comes back up
4. all nodes become secondary
5. our server goes up, still all nodes are secondary
6. server accepts clients and tries to perform operations in mongo , still all nodes are secondary
result:
until the highest priority node becomes primary again, during the time when all nodes are secondary, starting the server app immediately results in
"warning: No primary detected for set test1" followed by an abort signal

Comment by Randolph Tan [ 18/Sep/12 ]

DBClientCursor::init call() is a log message that appears when the driver cannot initialize the cursor. And it will make the query return a NULL DBClientCursor pointer. Does your application handle the NULL cursor case?

Comment by Aleksandar [ 18/Sep/12 ]

I just tried catching the signal, it resulted in SIGABRT. This situation is more likely to happen when you perform higher number of operations to mongodb i.e. when server is under load.

Comment by Aleksandar [ 18/Sep/12 ]

I have the same problem with the v2.2.0 C++ driver. I tested with and without replica set and got the same result. When using replica set if the primary node goes down (Ctrl + C) the application results in "DBClientCursor::init call() failed" and no exception can be caught in my application, looks like the driver sends some signal.

Generated at Thu Feb 08 03:13:28 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.