[SERVER-9869] C++ driver crash when mongodb is shutdowned Created: 06/Jun/13  Updated: 11/Jul/16  Resolved: 26/Jun/13

Status: Closed
Project: Core Server
Component/s: Internal Client
Affects Version/s: 2.2.3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Daniel Stefan Tacalau Assignee: Andrew Morrow (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Linux 3.4.37-40.44.amzn1.x86_64 #1 SMP Thu Mar 21 01:17:08 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux


Operating System: Linux
Participants:

 Description   

One of our applications crashed with the following stacktrace from the C++ mongodb driver thread:

Program terminated with signal 6, Aborted.
#0 0x00007fe627b40945 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) bt
#0 0x00007fe627b40945 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x00007fe627b4225b in abort () at abort.c:92
#2 0x00007fe627b3937e in __assert_fail_base (fmt=<value optimized out>, assertion=0x7fe62af586c4 "!pthread_mutex_lock(&m)",
file=0x7fe62af58790 "/usr/include/boost/thread/pthread/mutex.hpp", line=<value optimized out>, function=<value optimized out>) at assert.c:96
#3 0x00007fe627b39422 in __assert_fail (assertion=0x7fe62af586c4 "!pthread_mutex_lock(&m)", file=0x7fe62af58790 "/usr/include/boost/thread/pthread/mutex.hpp", line=125,
function=0x7fe62af65100 "void boost::timed_mutex::lock()") at assert.c:105
#4 0x00007fe62af3f2c3 in lock (this=0x2158180, t=0x0) at /usr/include/boost/thread/pthread/mutex.hpp:125
#5 lock (this=0x2158180, t=0x0) at /usr/include/boost/thread/locks.hpp:349
#6 unique_lock (this=0x2158180, t=0x0) at /usr/include/boost/thread/locks.hpp:227
#7 scoped_lock (this=0x2158180, t=0x0) at src/mongo/util/concurrency/mutex.h:104
#8 mongo::Logstream::flush (this=0x2158180, t=0x0) at src/mongo/util/log.cpp:357
#9 0x00007fe62af3fb8e in mongo::Logstream::operator<< (this=<value optimized out>, _endl=<value optimized out>) at src/mongo/util/log.h:307
#10 0x00007fe62aecb128 in mongo::ReplicaSetMonitor::_checkConnection (this=0x1f49920, conn=0x1f94c80, maybePrimary=<value optimized out>, verbose=true, nodesOffset=0)
at src/mongo/client/dbclient_rs.cpp:766
#11 0x00007fe62aecb66c in mongo::ReplicaSetMonitor::_check (this=0x1f49920, checkAllSecondaries=true) at src/mongo/client/dbclient_rs.cpp:814
#12 0x00007fe62aecf54c in mongo::ReplicaSetMonitor::check (this=0x1f49920, checkAllSecondaries=true) at src/mongo/client/dbclient_rs.cpp:955
#13 0x00007fe62aecf7a3 in mongo::ReplicaSetMonitor::checkAll (checkAllSecondaries=true) at src/mongo/client/dbclient_rs.cpp:322
#14 0x00007fe62aed7f74 in mongo::ReplicaSetMonitorWatcher::run (this=<value optimized out>) at src/mongo/client/dbclient_rs.cpp:202
#15 0x00007fe62af2c3db in mongo::BackgroundJob::jobBody (this=0x7fe62b196000, status=...) at src/mongo/util/background.cpp:64
#16 0x00007fe62af2e309 in operator() (this=<value optimized out>) at /usr/include/boost/bind/mem_fn_template.hpp:162
#17 operator()<boost::_mfi::mf1<void, mongo::BackgroundJob, boost::shared_ptr<mongo::BackgroundJob::JobStatus> >, boost::_bi::list0> (this=<value optimized out>)
at /usr/include/boost/bind/bind.hpp:306
#18 operator() (this=<value optimized out>) at /usr/include/boost/bind/bind_template.hpp:20
#19 boost::detail::thread_data<boost::_bi::bind_t<void, boost::_mfi::mf1<void, mongo::BackgroundJob, boost::shared_ptr<mongo::BackgroundJob::JobStatus> >, boost::_bi::list2<boost::_bi::value<mongo::BackgroundJob*>, boost::_bi::value<boost::shared_ptr<mongo::BackgroundJob::JobStatus> > > > >::run (this=<value optimized out>)
at /usr/include/boost/thread/detail/thread.hpp:56
#20 0x00007fe62885fd97 in boost::(anonymous namespace)::thread_proxy (param=<value optimized out>) at /usr/src/debug/boost-1.41.0.cmake0/libs/thread/src/pthread/thread.cpp:120
#21 0x00007fe628640c6b in start_thread (arg=0x7fe620d04700) at pthread_create.c:301
#22 0x00007fe627bee5ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb)

It seems that the crash happened when mongodb was shutdowned. There is a lot of inactivity in mongodb log until the moment of the shutdown:
Sat May 11 10:00:42 got signal 15 (Terminated), will terminate after current cmd ends
Sat May 11 10:00:42 [interruptThread] now exiting
Sat May 11 10:00:42 dbexit:
Sat May 11 10:00:42 [interruptThread] shutdown: going to close listening sockets...
Sat May 11 10:00:42 [interruptThread] closing listening socket: 6
Sat May 11 10:00:42 [interruptThread] closing listening socket: 7
Sat May 11 10:00:42 [interruptThread] removing socket file: /tmp/mongodb-27017.sock
Sat May 11 10:00:42 [conn15] got request after shutdown()
Sat May 11 10:00:42 [conn21] command vmdb.$cmd command:

{ getlasterror: 1, w: 2, wtimeout: 0 }

ntoreturn:1 keyUpdates:0 reslen:135 85934181ms
Sat May 11 10:00:42 [conn31] command vmdb.$cmd command:

{ getlasterror: 1, w: 2, wtimeout: 0 }

ntoreturn:1 keyUpdates:0 reslen:135 85812168ms
Sat May 11 10:00:42 [conn21] SocketException handling request, closing client connection: 9001 socket exception [2] server [172.31.1.9:48545]
Sat May 11 10:00:42 [interruptThread] shutdown: going to flush diaglog...
Sat May 11 10:00:42 [interruptThread] shutdown: going to close sockets...
Sat May 11 10:00:42 [interruptThread] shutdown: waiting for fs preallocator...
Sat May 11 10:00:42 [interruptThread] shutdown: lock for final commit...
Sat May 11 10:00:42 [interruptThread] shutdown: final commit...
Sat May 11 10:00:42 [conn31] SocketException handling request, closing client connection: 9001 socket exception [2] server [172.31.1.9:48591]
Sat May 11 10:00:42 [interruptThread] shutdown: closing all files...
Sat May 11 10:00:42 [interruptThread] closeAllFiles() finished
Sat May 11 10:00:42 [interruptThread] journalCleanup...
Sat May 11 10:00:42 [interruptThread] removeJournalFiles
Sat May 11 10:00:42 [interruptThread] shutdown: removing fs lock...
Sat May 11 10:00:42 dbexit: really exiting now
warning: bind_ip of 0.0.0.0 is unnecessary; listens on all ips by default
forked process: 1216
all output going to: /var/log/mongodb/mongodb.log
log file [/var/log/mongodb/mongodb.log] exists; copied to temporary file [/var/log/mongodb/mongodb.log.2013-05-11T10-01-39]
child process started successfully, parent exiting

Mongodb has only one node:
mongo
MongoDB shell version: 2.2.3
connecting to: test
sipxecs:PRIMARY> rs.config()
{
"_id" : "sipxecs",
"version" : 1,
"members" : [

{ "_id" : 0, "host" : "openuc.ezuce.com:27017", "priority" : 2 }

]
}



 Comments   
Comment by Andrew Morrow (Inactive) [ 26/Jun/13 ]

Closing since the solution of using dbexit appears to be working. Feel free to re-open if the issue recurs.

Comment by Andrew Morrow (Inactive) [ 19/Jun/13 ]

Hi Daniel -

I've verified with another engineer that it is OK for you to call mongo::dbexit as you are doing. Since you aren't able to reproduce the crash right now, and since you have opened another ticket for the logging issue (thanks for doing that), I think it makes sense to close this ticket for now. If you experience the original crash now that you are using mongo::dbexit, you can re-open this ticket and we can investigate further.

Thanks,
Andrew

Comment by Daniel Stefan Tacalau [ 19/Jun/13 ]

Andrew,

Sorry for the confusing answer. Let me clarify:
1. When I said "the one above" I meant comment with date "Jun 19 2013 10:04:32 AM UTC".
2. AFAIK mongo::dbexit cleared only SERVER-8891 style crash, but I might be wrong. As I said it's not reproducible and I haven't seen it since I opened this ticket.
3. Short answer: yes.

I'll open a separate ticket for the logging stuff.

Thanks,
Daniel

Comment by Andrew Morrow (Inactive) [ 19/Jun/13 ]

Hi Daniel -

A few questions and a suggestion:

  • When you say "the one above", are you referring to the crash in your original report, or the one you posted, or the one from SERVER-8891.
  • When you add the call to mongo::dbexit, does this also clear up your original crash? Or only the SERVER-8891 style crash?
  • Assuming that dbexit is working for you, and that it is proper for you to call it (I'm looking into that now) would that be a sufficient resolution for this issue for you?

For the C++ driver logging questions, I'd suggest either opening a separate ticket so we can track the issues independently.

Thanks,
Andrew

Comment by Daniel Stefan Tacalau [ 19/Jun/13 ]

Andrew,

I forgot to mention that I always get crashes on exit similar with the one above in most of our applications that are using the C++ mongo driver. After researching the matter I
found out to be a known problem with static constructors/destructors so I solved it by using a call to mongo::dbexit(mongo::EXIT_CLEAN). Is this OK?

Is there any way I can enable in my app C++ driver logging on DEBUG level? I've tried using mongo::initLogging(<some-file>, true) but it does not write too much. It seems to write only messages logged by calls to "log()" and it does not write messages logged by "log(<some-level>)".

Thanks,
Daniel

Comment by Daniel Stefan Tacalau [ 19/Jun/13 ]

Andrew,

Here is what I get when I run the test program from https://jira.mongodb.org/browse/SERVER-8891

sh-4.1# gdb --args ./a.out
GNU gdb (GDB) Amazon Linux (7.2-60.13.amzn1)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-amazon-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/ec2-user/a.out...done.
(gdb) r
Starting program: /home/ec2-user/a.out
[Thread debugging using libthread_db enabled]
Wed Jun 19 10:01:17 starting new replica set monitor for replica set c2 with seed of localhost:27017
[New Thread 0x7ffff46ac700 (LWP 12811)]
Wed Jun 19 10:01:17 successfully connected to seed localhost:27017 for replica set c2
Wed Jun 19 10:01:17 EINTR retry 1
[Thread 0x7ffff46ac700 (LWP 12811) exited]
Wed Jun 19 10:01:17 warning: node: localhost:27017 isn't a part of set: c2 ismaster:

{ setName: "sipxecs", ismaster: true, secondary: false, hosts: [ "openuc.ezuce.com:27017" ], primary: "openuc.ezuce.com:27017", me: "openuc.ezuce.com:27017", maxBsonObjectSize: 16777216, localTime: new Date(1371636077662), ok: 1.0 }

Wed Jun 19 10:01:19 warning: No primary detected for set c2
Wed Jun 19 10:01:19 All nodes for set c2 are down. This has happened for 1 checks in a row. Polling will stop after 29 more failed checks
Wed Jun 19 10:01:19 replica set monitor for replica set c2 started, address is c2/
[New Thread 0x7ffff46ac700 (LWP 12815)]
Wed Jun 19 10:01:19 deleting replica set monitor for: c2/

Program received signal SIGSEGV, Segmentation fault.
std::basic_string<char, std::char_traits<char>, std::allocator<char> >::compare (this=Unhandled dwarf expression opcode 0xf3
) at /usr/src/debug/gcc-4.7.2-20121109/obj-x86_64-amazon-linux/x86_64-amazon-linux/libstdc++-v3/include/bits/basic_string.h:2224
2224 const size_type __size = this->size();
(gdb) bt
#0 std::basic_string<char, std::char_traits<char>, std::allocator<char> >::compare (this=Unhandled dwarf expression opcode 0xf3
) at /usr/src/debug/gcc-4.7.2-20121109/obj-x86_64-amazon-linux/x86_64-amazon-linux/libstdc++-v3/include/bits/basic_string.h:2224
#1 0x00007ffff75f7c3d in operator< <char, std::char_traits<char>, std::allocator<char> > (this=0x7ffff78b9080, __k="c2") at /usr/include/c++/4.4.7/bits/basic_string.h:2317
#2 operator() (this=0x7ffff78b9080, __k="c2") at /usr/include/c++/4.4.7/bits/stl_function.h:230
#3 _M_lower_bound (this=0x7ffff78b9080, __k="c2") at /usr/include/c++/4.4.7/bits/stl_tree.h:986
#4 lower_bound (this=0x7ffff78b9080, __k="c2") at /usr/include/c++/4.4.7/bits/stl_tree.h:745
#5 lower_bound (this=0x7ffff78b9080, __k="c2") at /usr/include/c++/4.4.7/bits/stl_map.h:701
#6 std::map<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<mongo::HostAndPort, std::allocator<mongo::HostAndPort> >, std::less<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::vector<mongo::HostAndPort, std::allocator<mongo::HostAndPort> > > > >::operator[] (this=0x7ffff78b9080, __k="c2")
at /usr/include/c++/4.4.7/bits/stl_map.h:447
#7 0x00007ffff75e7e5b in mongo::ReplicaSetMonitor::_cacheServerAddresses_inlock (this=0x612280) at src/mongo/client/dbclient_rs.cpp:265
#8 0x00007ffff75e9885 in mongo::ReplicaSetMonitor::~ReplicaSetMonitor (this=0x612280, __in_chrg=<value optimized out>) at src/mongo/client/dbclient_rs.cpp:257
#9 0x00007ffff75fa3e2 in checked_delete<mongo::ReplicaSetMonitor> (this=<value optimized out>) at /usr/include/boost/checked_delete.hpp:34
#10 boost::detail::sp_counted_impl_p<mongo::ReplicaSetMonitor>::dispose (this=<value optimized out>) at /usr/include/boost/smart_ptr/detail/sp_counted_impl.hpp:78
#11 0x00007ffff75f4ff9 in release (this=<value optimized out>, __in_chrg=<value optimized out>) at /usr/include/boost/smart_ptr/detail/sp_counted_base_gcc_x86.hpp:145
#12 boost::detail::shared_count::~shared_count (this=<value optimized out>, __in_chrg=<value optimized out>) at /usr/include/boost/smart_ptr/detail/shared_count.hpp:217
#13 0x00007ffff75f875e in ~shared_ptr (this=0x7ffff78b9040, __x=0x612240) at /usr/include/boost/smart_ptr/shared_ptr.hpp:169
#14 ~pair (this=0x7ffff78b9040, __x=0x612240) at /usr/include/c++/4.4.7/bits/stl_pair.h:68
#15 destroy (this=0x7ffff78b9040, __x=0x612240) at /usr/include/c++/4.4.7/ext/new_allocator.h:115
#16 _M_destroy_node (this=0x7ffff78b9040, __x=0x612240) at /usr/include/c++/4.4.7/bits/stl_tree.h:383
#17 std::_Rb_tree<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, boost::shared_ptr<mongo::ReplicaSetMonitor> >, std::_Select1st<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, boost::shared_ptr<mongo::ReplicaSetMonitor> > >, std::less<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, boost::shared_ptr<mongo::ReplicaSetMonitor> > > >::_M_erase (this=0x7ffff78b9040, __x=0x612240) at /usr/include/c++/4.4.7/bits/stl_tree.h:972
#18 0x00007ffff75f888d in ~_Rb_tree (this=<value optimized out>, __in_chrg=<value optimized out>) at /usr/include/c++/4.4.7/bits/stl_tree.h:614
#19 std::map<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, boost::shared_ptr<mongo::ReplicaSetMonitor>, std::less<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, boost::shared_ptr<mongo::ReplicaSetMonitor> > > >::~map (this=<value optimized out>, __in_chrg=<value optimized out>)
at /usr/include/c++/4.4.7/bits/stl_map.h:87
#20 0x00007ffff6d6a38d in __cxa_finalize (d=0x7ffff78b11c0) at cxa_finalize.c:56
#21 0x00007ffff75be8c6 in __do_global_dtors_aux () from /usr/lib64/libmongoclient.so
#22 0x00007fffffffe4f0 in ?? ()
#23 0x00007fffffffe630 in ?? ()
#24 0x00007ffff767af21 in _fini () from /usr/lib64/libmongoclient.so
#25 0x00007fffffffe630 in ?? ()
#26 0x00007ffff7ded31d in _dl_fini () at dl-fini.c:259
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Daniel

Comment by Andrew Morrow (Inactive) [ 18/Jun/13 ]

Hi Daniel -

I wonder if you may be affected by SERVER-8891. Would you be able to test the little example program at the beginning of that ticket in your environment and see if you observe similar behavior? The stack traces aren't a perfect match between that ticket and this one, but there is enough similarity that I'm curious.

As to your question about instrumentation, one possibility would be to use valgrind to see if there is a static destructor ordering problem, which would almost certainly show up as a use after free, or some other sort of similar memory error.

Thanks,
Andrew

Comment by Daniel Stefan Tacalau [ 18/Jun/13 ]

Andrew,

This hasn't reproduced so far.

1. We're using C++ driver 2.2.3 from CentOS 6.4 repo.
2. Boost version is 1.41 .
3. The crash occurs only sometimes.
4. I don't have other clients logs.

Is there any kind of instrumentation that I can enable in the C++ driver so that when the crash occur again I'll have some reports?

Thanks for looking into this,
Daniel

Comment by Andrew Morrow (Inactive) [ 17/Jun/13 ]

Hi Daniel -

Just checking in to see if this is still an issue for you.

Thanks,
Andrew

Comment by Andrew Morrow (Inactive) [ 12/Jun/13 ]

Hi Daniel -

Could you provide some additional details to help us diagnose the issue?

  • Did you build the driver from the C++ driver source tarballs? If so, what version? If not, how did you obtain the driver library?
  • The driver appears to be built against boost from /usr/include. What version of boost is in place on the system?
  • Is this crash consistently reproducible when you terminate the server, or occurs only sometimes?
  • If you can reproduce this consistently, does the client program produce any logging output in the time leading up to the crash? If not, do you happen to have archived client logs from when this crash occurred?

Thanks,
Andrew

Generated at Thu Feb 08 03:21:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.