[SERVER-5541] crash in C++ client driver during shutdowing primary mongo server from repset Created: 07/Apr/12  Updated: 11/Jul/16  Resolved: 11/Apr/12

Status: Closed
Project: Core Server
Component/s: Internal Client
Affects Version/s: 2.0.2, 2.0.4
Fix Version/s: 2.0.5, 2.1.1

Type: Bug Priority: Critical - P2
Reporter: Alexander Borodetsky Assignee: Randolph Tan
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

CentOS 6
server 2.0.4
C++ client 2.0.2-pre


Operating System: Linux
Participants:

 Description   

I was testing stability of my client which used mongo server and client was crashed in mongo C++ client driver at moment of shutdowing primary mongo server.
This crash is reproduced every time. See guide below

Here is stack trace of crash:
#0 0x0830d98d in mongo::DBClientCursor::peekError(mongo::BSONObj*) ()
#1 0x08313f6b in mongo::DBClientReplicaSet::checkSlaveQueryResult(std::auto_ptr<mongo::DBClientCursor>) ()
#2 0x08313a3c in mongo::DBClientReplicaSet::query(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::Query, int, int, mongo::BSONObj const*, int, int) ()

Guide to reproduce this:

Make replicaSet configuration like this:
PRIMARY> rs.conf()
{
"_id" : "wop_test",
"version" : 15,
"members" : [

{ "_id" : 0, "host" : "10.68.11.136:27017", "priority" : 26 }

,

{ "_id" : 1, "host" : "10.68.11.138:27017", "votes" : 0, "priority" : 0 }

,

{ "_id" : 2, "host" : "10.68.11.138:27018", "votes" : 0, "priority" : 0 }

]
}

Shutdown server #1 and #2 (which have no votes)

Then shutdown last standing primary #0 and immediately make request from client.
If request will be doing a few seconds after shutdowning #0, client will be handle it correct (throw exception)

May be it because I use old C++ client 2.0.2-pre with server 2.0.4 ?



 Comments   
Comment by auto [ 17/Apr/12 ]

Author:

{u'login': u'', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}

Message: SERVER-5541 crash in C++ client driver during shutdowing primary mongo server from repset
Branch: v2.0
https://github.com/mongodb/mongo/commit/d87357cf26a75dca02407f03326876e1d1803369

Comment by Randolph Tan [ 17/Apr/12 ]

You're welcome.

Comment by Alexander Borodetsky [ 17/Apr/12 ]

Thanx a lot for your fix and assistance.
It works properly now

Comment by Randolph Tan [ 11/Apr/12 ]

Hi,

The new commit now returns whatever the query method returns you, for consistency. So your client code should be prepared to handle null pointers as you would when using DBCLientConnection.

Comment by auto [ 11/Apr/12 ]

Author:

{u'login': u'', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}

Message: SERVER-5541 crash in C++ client driver during shutdowing primary mongo server from repset
Branch: master
https://github.com/mongodb/mongo/commit/ab60e77554527f22626d050fe6b01ee44fb55e01

Comment by Alexander Borodetsky [ 09/Apr/12 ]

assert didn't solve this issue properly. This fix leads to calling abort() on windows or raise(SIGTRAP) on linux for debug build. And my server abnormally stops.
To resolve it I try to put
if ( !result.get() ) throw AssertionException ("no slaves due no cursor",0);
instead assert( result.get() );

But it lead to returning NULL cursor from DBClientReplicaSet::query without throwing any exception. It happens because checkMaster()->query() is called after three attempts of calling checkSlaveQueryResult(). And so checkMaster()->query() return NULL cursor too.

So my question is: Is it correct that DBClientReplicaSet::query return NULL cursor?
As I understand by investigating src code it is correct and I have to handle NULL cursor. Am I right?

Comment by auto [ 09/Apr/12 ]

Author:

{u'login': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'}

Message: assert that we have a cursor rather than segv SERVER-5541
Branch: master
https://github.com/mongodb/mongo/commit/ea6fb0bcc1233b8439e57225df3d4b088e019662

Comment by Eliot Horowitz (Inactive) [ 08/Apr/12 ]

pushed a possible fix to 2.0
can you either try from git or with tomorrows build

Comment by auto [ 08/Apr/12 ]

Author:

{u'login': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'}

Message: assert that we have a cursor rather than segv SERVER-5541
Branch: v2.0
https://github.com/mongodb/mongo/commit/dbf67c31dc46b0f5bac2ed8827be2f3cb39a2dda

Comment by Alexander Borodetsky [ 07/Apr/12 ]

The crash caused by following:
I have caught it on Windows in debugger )

DBClientReplicaSet::checkSlaveQueryResult( auto_ptr<DBClientCursor> result ) is calling with NULL value of argument "result"
It happens because its caller DBClientReplicaSet::query calls checkSlave()->query(...) which return nothing.

Here is call stack:
char-srv-dbg.exe!std::_Debug_message(const wchar_t * message=0x0099f480, const wchar_t * file=0x00995860, unsigned int line=742) Line 24 C++
char-srv-dbg.exe!std::auto_ptr<mongo::DBClientCursor>::operator->() Line 742 + 0x14 bytes C++
char-srv-dbg.exe!mongo::DBClientReplicaSet::checkSlaveQueryResult(std::auto_ptr<mongo::DBClientCursor> result=auto_ptr {b=

{...} _client=??? ns={...}

...}) Line 798 + 0xc bytes C++
> char-srv-dbg.exe!mongo::DBClientReplicaSet::query(const std::basic_string<char,std::char_traits<char>,std::allocator<char> > & ns="wop_rel.log", mongo::Query query=

{...}

, int nToReturn=150, int nToSkip=0, const mongo::BSONObj * fieldsToReturn=0x00000000, int queryOptions=4, int batchSize=0) Line 753 + 0xa6 bytes C++

Comment by Alexander Borodetsky [ 07/Apr/12 ]

Any other gdb command return "no debug info" error

Comment by Alexander Borodetsky [ 07/Apr/12 ]

Unfortunately I have some troubles with symbol on my building system.

So I have only backtarce (in main description) and this info:

(gdb) info frame
Stack level 0, frame at 0xdfdea980:
eip = 0x830eb3d in mongo::DBClientCursor::peekError(mongo::BSONObj*); saved eip 0x8315f79
called by frame at 0xdfdea9e0
source language c++.
Arglist at 0xdfdea978, args:
Locals at 0xdfdea978, Previous frame's sp is 0xdfdea980
Saved registers:
ebp at 0xdfdea978, eip at 0xdfdea97c

Maybe you have any ideas about gdb?

Comment by Alexander Borodetsky [ 07/Apr/12 ]

Yes. I try doing it now. Wait a minute. I have no symbols on my test station. I need to transfer core-dump to build station.

Comment by Eliot Horowitz (Inactive) [ 07/Apr/12 ]

Can you run in gdb so you can get line number of seg fault?

Comment by Alexander Borodetsky [ 07/Apr/12 ]

I'm catching exception. But only std::exception, not SIGSEGV ))

Comment by Alexander Borodetsky [ 07/Apr/12 ]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xe25efb70 (LWP 17542)]
0x0830eb3d in mongo::DBClientCursor::peekError(mongo::BSONObj*) ()

But if I wait a few second before doing request, client handle it (shutdowning server) with exception (I suppose it is correct behaviour ) )

Comment by Eliot Horowitz (Inactive) [ 07/Apr/12 ]

Was it a crash or just an exception being thrown?
Are you catching exceptions?

Comment by Alexander Borodetsky [ 07/Apr/12 ]

I have got C++ client driver 2.0.4 source code and rebuild my solution ( client from point of view of mongodb ) with it.
Client still crashed at such place

Comment by Alexander Borodetsky [ 07/Apr/12 ]

Note. It's important to make request immediately after shutdowning server. Otherwise (if you are late) shutdown of last server will be handled by client with correct exception

Generated at Thu Feb 08 03:09:12 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.