[SERVER-8891] Simple client fail with segmentation fault in mongoclient library Created: 07/Mar/13  Updated: 11/Jul/16  Resolved: 25/Jul/13

Status: Closed
Project: Core Server
Component/s: Internal Client
Affects Version/s: 2.2.2, 2.2.3
Fix Version/s: 2.4.6, 2.5.2

Type: Bug Priority: Critical - P2
Reporter: Stanislav Ievlev Assignee: Tad Marshall
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Linux


Issue Links:
Related
related to SERVER-10372 ReplicaSetMonitor creates a thread th... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

Example code below fails with segmentation fault at exit:

#include <mongo/client/dbclient.h>
 
using namespace mongo;
 
int main() {
   std::vector<HostAndPort> hosts;
   hosts.push_back(HostAndPort("localhost"));
 
   DBClientReplicaSet connection("c2", hosts);
   connection.connect();
}



 Comments   
Comment by Gerry F [ 22/Aug/13 ]

The official 2.4.6 release has fixed this bug.

Comment by Gerry F [ 29/Jul/13 ]

The above fix appears to solve this specific crash in my testing.

SERVER-8707 and SERVER-10372 describe similar crashes, both related to ReplicaSetMonitorWatcher.

Comment by auto [ 29/Jul/13 ]

Author:

{u'username': u'tadmarshall', u'name': u'Tad Marshall', u'email': u'tad@10gen.com'}

Message: SERVER-8891 Destroy static objects in a safer order

Change the order of some static objects and group them near the
start of the source file, along with a comment explaining what is
going on. The crashes we've seen have been due to destructors for
ReplicaSetMonitors (triggered by the destruction of _sets) trying
to use the _seedServers map, which had been destroyed already. By
changing the order of the object definitions, we destroy _sets
before destroying _seedServers, preventing the crash.

There may be other cases that are not solved by this fix, and there
is still a race due to the running ReplicaSetMonitorWatcher thread,
so this is unlikely to be the last word on crashes of this type.
Branch: v2.4
https://github.com/mongodb/mongo/commit/07fd444dd6d3be33223b0324a78570b75c8d3d31

Comment by auto [ 25/Jul/13 ]

Author:

{u'username': u'tadmarshall', u'name': u'Tad Marshall', u'email': u'tad@10gen.com'}

Message: SERVER-8891 Destroy static objects in a safer order

Change the order of some static objects and group them near the
start of the source file, along with a comment explaining what is
going on. The crashes we've seen have been due to destructors for
ReplicaSetMonitors (triggered by the destruction of _sets) trying
to use the _seedServers map, which had been destroyed already. By
changing the order of the object definitions, we destroy _sets
before destroying _seedServers, preventing the crash.

There may be other cases that are not solved by this fix, and there
is still a race due to the running ReplicaSetMonitorWatcher thread,
so this is unlikely to be the last word on crashes of this type.
Branch: master
https://github.com/mongodb/mongo/commit/50dc157e0d617b3aa9014bf47b9531e6e510912b

Comment by Gerry F [ 22/Jul/13 ]

I have this bug in 2.2.0, 2.2.4, and 2.4.5 C++ clients.

In my debugger, the first reference to the ReplicaSetMonitor is destroyed correctly. When the second 'shared' reference tries to destroy the same object, the memory has already been freed.

The second reference is made when the ReplicaSetMonitor's BackgroundJob creates a heartbeat thread.

Another way to exercise this bug is to use mongo::ReplicaSetMonitor::remove() to destroy the ReplicaSetMonitor, and then wait up to 10 seconds for the heartbeat thread to reference the freed memory.

There is a race condition, as the process occasionally exits successfully, more often on windows than linux.

My first idea for a fix would be to eliminate the BackgroundJob and the stray thread.

Comment by Erik Snider [ 17/Jul/13 ]

I just tested it on 2.5.1-pre-, and the problem still exists. I ran the example program through valgrind, and this blurb looks like it may be useful:

==17087== Invalid free() / delete / delete[] / realloc()
==17087==    at 0x4C2A44B: operator delete(void*) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==17087==    by 0x552253F: std::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string() (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.17)
==17087==    by 0x413C89: mongo::HostAndPort::~HostAndPort() (in /home/developer/Desktop/tutorial)
==17087==    by 0x4149E1: void std::_Destroy<mongo::HostAndPort>(mongo::HostAndPort*) (in /home/developer/Desktop/tutorial)
==17087==    by 0x414875: void std::_Destroy_aux<false>::__destroy<mongo::HostAndPort*>(mongo::HostAndPort*, mongo::HostAndPort*) (in /home/developer/Desktop/tutorial)
==17087==    by 0x41BDBE: mongo::ReplicaSetMonitor::_cacheServerAddresses_inlock() (stl_construct.h:128)
==17087==    by 0x41BED2: mongo::ReplicaSetMonitor::~ReplicaSetMonitor() (dbclient_rs.cpp:391)
==17087==    by 0x431DB1: boost::detail::sp_counted_impl_p<mongo::ReplicaSetMonitor>::dispose() (checked_delete.hpp:34)
==17087==    by 0x430804: std::_Rb_tree<std::string, std::pair<std::string const, boost::shared_ptr<mongo::ReplicaSetMonitor> >, std::_Select1st<std::pair<std::string const, boost::shared_ptr<mongo::ReplicaSetMonitor> > >, std::less<std::string>, std::allocator<std::pair<std::string const, boost::shared_ptr<mongo::ReplicaSetMonitor> > > >::_M_erase(std::_Rb_tree_node<std::pair<std::string const, boost::shared_ptr<mongo::ReplicaSetMonitor> > >*) (sp_counted_base_gcc_x86.hpp:146)
==17087==    by 0x5ED3900: __run_exit_handlers (exit.c:78)
==17087==    by 0x5ED3984: exit (exit.c:100)
==17087==    by 0x5EB9773: (below main) (libc-start.c:258)
==17087==  Address 0x6472b10 is 0 bytes inside a block of size 34 free'd
==17087==    at 0x4C2A44B: operator delete(void*) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==17087==    by 0x4309CB: std::_Rb_tree<std::string, std::pair<std::string const, std::vector<mongo::HostAndPort, std::allocator<mongo::HostAndPort> > >, std::_Select1st<std::pair<std::string const, std::vector<mongo::HostAndPort, std::allocator<mongo::HostAndPort> > > >, std::less<std::string>, std::allocator<std::pair<std::string const, std::vector<mongo::HostAndPort, std::allocator<mongo::HostAndPort> > > > >::_M_erase(std::_Rb_tree_node<std::pair<std::string const, std::vector<mongo::HostAndPort, std::allocator<mongo::HostAndPort> > > >*) (basic_string.h:246)
==17087==    by 0x5ED3900: __run_exit_handlers (exit.c:78)
==17087==    by 0x5ED3984: exit (exit.c:100)
==17087==    by 0x5EB9773: (below main) (libc-start.c:258)

Comment by Stanislav Ievlev [ 27/May/13 ]

I've tested on 2.4.3.

Problem still exists.

Comment by Stanislav Ievlev [ 28/Mar/13 ]

Sorry, I have no time to do it.
Could you test it yourself on your stands?

Comment by Ian Whalen (Inactive) [ 26/Mar/13 ]

Hi Stanislav, have you been able to try our most recent stable release (2.4.1) to see if the problem goes away?

Comment by Tad Marshall [ 10/Mar/13 ]

SERVER-8226 is marked as fixed in version 2.4.0-rc1, so the bug is still present in versions 2.2.2 and 2.2.3.

The call to _exit() is to work around a problem with static constructors/destructors. Because the C++ standard does not specify an order in which static constructors and destructors run, we can segfault on exit when objects get destroyed in the "wrong" order. This is a separate problem from making sure that the "global initializers" have run before using the DBReplicaSet class. The symptom of this other problem is that everything works great right up until you call exit() (no underscore) or return from main(), at which point instead of exiting cleanly you get a segfault. This is a bug, and the calls to _exit() are just a way to avoid hitting it.

Can you try a version 2.4.0-rc1 or newer (such as 2.4.0-rc2) and see if your problem goes away? Does your program then work fine, but segfault on exit?

Comment by Stanislav Ievlev [ 10/Mar/13 ]

I've tested this sample both with 2.2.2 and 2.2.3 versions.

I haven't tried to call runGlobalInitializersOrDie(), but according to a last comment in SERVER-8226 I don't need to do it.

Where can I find an example of a right usage of DBReplicaSet class?

I've seen that you are using an ugly hack with _exit() in your command line utilities, but I cannot do the same hack in my application.

Comment by sam.helman@10gen.com [ 08/Mar/13 ]

Hello,

Have you tried calling runGlobalInitializersOrDie, as specified in SERVER-7729 and SERVER-8226? Additionally, you put as the "Affects versions" 2.2.2 and 2.2.3 - have you experienced the error on both?

Comment by Ian Whalen (Inactive) [ 07/Mar/13 ]

moving from Blocker to Critical as this is does not appear to be a release-blocking issue.

Generated at Thu Feb 08 03:18:44 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.