[SERVER-10427] segfault when calling mongo::ScopedDbConnection::getScopedDbConnection(connection_string) with replicaset Created: 04/Aug/13  Updated: 06/Aug/13  Resolved: 06/Aug/13

Status: Closed
Project: Core Server
Component/s: Internal Client
Affects Version/s: 2.4.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Ingo Schramm Assignee: Tad Marshall
Resolution: Done Votes: 0
Labels: connection, crash, driver, replicaset
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Linux Ubuntu, Redhat


Operating System: Linux
Steps To Reproduce:

With a little of bad luck - or Redhat (?) -, this code is sufficient.

  // conn_str like: rs-name/host1:1001,host2:1001,host3:1001
  // I tried to replace hostnames with IP but it does not help
 
  std::string errors;
  const auto cs = mongo::ConnectionString::parse(conn_str, errors);
  if (! cs.isValid() )
  {
    return false;
  }
 
  // here I keep the default socketTimeout=0
  auto conn_ptr = mongo::ScopedDbConnection::getScopedDbConnection(cs); // line 137, segfault

Participants:

 Description   

Hi,

as a surprise I suffer from a segfault when I connect to a replicaset of MongoDB 2.4.3 with C++ driver 2.4.1. RS consists of 1 master and 2 slaves.

The surprise comes from the fact that exactly the same code works well in most environments (Ubuntu, Redhat) and even in the above environment (Redhat) in another application against the same replicaset.

Code (executed in a lambda):

  // conn_str like: rs-name/host1:1001,host2:1001,host3:1001
  // I tried to replace hostnames with IP but it does not help
 
  std::string errors;
  const auto cs = mongo::ConnectionString::parse(conn_str, errors);
  if (! cs.isValid() )
  {
    return false;
  }
 
  // here I keep the default socketTimeout=0
  auto conn_ptr = mongo::ScopedDbConnection::getScopedDbConnection(cs); // line 137, segfault

Stacktrace:

#0  0x00007ffff7b20d3d in inet_pton () from ~/mdbtest-pkg/lib/libc-2.15.so
#1  0x00007ffff7ae8e10 in ?? () from ~/mdbtest-pkg/lib/libc-2.15.so
#2  0x00007ffff7aec99e in getaddrinfo () from ~/mdbtest-pkg/lib/libc-2.15.so
#3  0x000000000190008a in mongo::SockAddr::SockAddr(char const*, int) ()
#4  0x00000000018adadb in mongo::DBClientConnection::_connect(std::basic_string<char, std::char_traits<char>, std::allocator<char> >&) ()
#5  0x00000000018adeb4 in mongo::DBClientConnection::connect(mongo::HostAndPort const&, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&) ()
#6  0x00000000018ab8f0 in mongo::ConnectionString::connect(std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, double) const ()
#7  0x00000000018c5313 in mongo::ReplicaSetMonitor::_populateHosts_inSetsLock(std::vector<mongo::HostAndPort, std::allocator<mongo::HostAndPort> > const&) ()
#8  0x00000000018c5996 in mongo::ReplicaSetMonitor::ReplicaSetMonitor(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<mongo::HostAndPort, std::allocator<mongo::HostAndPort> > const&) ()
#9  0x00000000018c657b in mongo::ReplicaSetMonitor::createIfNeeded(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<mongo::HostAndPort, std::allocator<mongo::HostAndPort> > const&) ()
#10 0x00000000018c6726 in mongo::DBClientReplicaSet::DBClientReplicaSet(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<mongo::HostAndPort, std::allocator<mongo::HostAndPort> > const&, double) ()
#11 0x00000000018ab935 in mongo::ConnectionString::connect(std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, double) const ()
#12 0x00000000018a7814 in mongo::DBConnectionPool::get(mongo::ConnectionString const&, double) ()
#13 0x00000000018a4be7 in mongo::ScopedDbConnection::getScopedDbConnection(mongo::ConnectionString const&, double) ()
#14 0x0000000000f67ebe in operator() (__closure=<value optimized out>) at persistermdb.cpp:137
#15 0x0000000000f67ff1 in _Function_handler<bool, PersisterMDB::connect()::<lambda()> >::_M_invoke(const _Any_data &) (
    __functor=<value optimized out>) at /opt/c1/gcc/lib/gcc/x86_64-unknown-linux-gnu/4.7.2/../../../../include/c++/4.7.2/functional:1912
[...]

You may note the libc-2.15 which I LD_PRELOAD because the test machine has another libc version than the build machine. Both application and driver are built against libc-2.15 of course and the preload does work well with other app which uses exactly the same code.

Any hints appreciated.

Cheers
Ingo



 Comments   
Comment by Ingo Schramm [ 06/Aug/13 ]

Obviously not. Just close it - and thanks again.

But you could give the SConstruct patch mentioned above a try.

Cheers,
Ingo

Comment by Tad Marshall [ 06/Aug/13 ]

Hi Ingo,

Is this not a problem with MongoDB then? Should we close this ticket?

Tad

Comment by Ingo Schramm [ 06/Aug/13 ]

Sorry, my fault so far.

I tracked the problem down to some shared library loaded not from where I expected it to be loaded.
Changing RPATH in my binary was helpful. The stacktrace was misleading.

Thanks for your help anyways!

Cheers
Ingo

Comment by Ingo Schramm [ 05/Aug/13 ]

Upgraded driver to 2.4.5, result is exactly the same. Running locally against RS on mongod 2.4.5 - no crash. Running remotely against RS on mongod 2.4.3 - crash.

#0 0x00007ffff7b20d3d in inet_pton () from /home/ingo.schramm/mdbtest-pkg/lib/libc-2.15.so
#1 0x00007ffff7ae8e10 in ?? () from /home/ingo.schramm/mdbtest-pkg/lib/libc-2.15.so
#2 0x00007ffff7aec99e in getaddrinfo () from /home/ingo.schramm/mdbtest-pkg/lib/libc-2.15.so
#3 0x00000000019000da in mongo::SockAddr::SockAddr(char const*, int) ()
#4 0x00000000018adb2b in mongo::DBClientConnection::_connect(std::basic_string<char, std::char_traits<char>, std::allocator<char> >&) ()
#5 0x00000000018adf04 in mongo::DBClientConnection::connect(mongo::HostAndPort const&, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&) ()
#6 0x00000000018ab940 in mongo::ConnectionString::connect(std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, double) const ()
#7 0x00000000018c5363 in mongo::ReplicaSetMonitor::_populateHosts_inSetsLock(std::vector<mongo::HostAndPort, std::allocator<mongo::HostAndPort> > const&) ()
#8 0x00000000018c59e6 in mongo::ReplicaSetMonitor::ReplicaSetMonitor(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<mongo::HostAndPort, std::allocator<mongo::HostAndPort> > const&) ()
#9 0x00000000018c65cb in mongo::ReplicaSetMonitor::createIfNeeded(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<mongo::HostAndPort, std::allocator<mongo::HostAndPort> > const&) ()
#10 0x00000000018c6776 in mongo::DBClientReplicaSet::DBClientReplicaSet(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<mongo::HostAndPort, std::allocator<mongo::HostAndPort> > const&, double) ()
#11 0x00000000018ab985 in mongo::ConnectionString::connect(std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, double) const ()
#12 0x00000000018a7864 in mongo::DBConnectionPool::get(mongo::ConnectionString const&, double) ()
#13 0x00000000018a4c37 in mongo::ScopedDbConnection::getScopedDbConnection(mongo::ConnectionString const&, double) ()
#14 0x0000000000f67efe in operator() (__closure=<value optimized out>) at persistermdb.cpp:139

BTW I had to patch the SConstruct file line 85:

from: boostLibs = ["thread", "filesystem", "system"]
to: boostLibs = ["thread", "system", "filesystem"]

Cheers
Ingo

Comment by Ingo Schramm [ 05/Aug/13 ]

No, I do not touch ReplicaSetMonitor at all. Also, the crash happens before I'd get the chance to do so.

I will fire local tests against newer versions of everything, but it may take some time to set it all up. Unfortunately, I cannot reproduce the crash locally (yet), only in the data center.

Cheers
Ingo

Comment by Tad Marshall [ 05/Aug/13 ]

Hi Ingo,

Thanks for all the answers!

Do you call ReplicaSetMonitor::remove(), or use ReplicaSetMonitor directly in any way? It will be used "on your behalf" internally, but direct calls to it are capable of causing problems.

You are right that several of the issues we've identified are related to exiting while active replica sets exist. We may have those issues fixed in the latest master branch code, but we also may still have remaining issues there. There are definitely issues of that type in the 2.4.1 and 2.4.3 versions.

We have separate issues in one of our unit tests, but that unit test uses a "mock" to simulate a replica set and we haven't yet narrowed down the problem to a specific location. It could be in code that all replica sets use or it could be in the mock only. SERVER-8707 is the ticket for the unit test crashes.

Tad

Comment by Ingo Schramm [ 05/Aug/13 ]

Hi Tad!

Yes, exactly 2.4.1 of the C++ driver with version 2.4.3 of MongoDB on the machine where it crashes. The installation is quite complex and quite big.

I will give 2.4.5 of the driver a try today. This is quite a fun since we build against our own SDK and integrating scons into our CMake/make landscape is a task on its own.

Build OS:

Linux 3.2.0-51-generic #77-Ubuntu SMP Wed Jul 24 20:18:19 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
Ubuntu 12.04.2 LTS
test here against MongoDB v2.2.3 compiled from source
no crash

GCC g++ (GCC) 4.7.2

Build command like (test build):

g++ -D_DEBUG -D_GNU_SOURCE -D_STDC_LIMIT_MACROS -D_STDC_CONSTANT_MACROS -finput-charset=UTF-8 -std=c++0x -Wall -Wextra -Werror -pedantic -O2 -g -gstrict-dwarf -fno-inline -fno-inline-functions -fPIC -fopenmp -isystem(SDK stuff) -I(Src stuff)

C++ Driver is also build like that against SDK.

Test OS:

Linux 2.6.32-358.6.1.el6.x86_64 #1 SMP Fri Mar 29 16:51:51 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 6.4 (Santiago)
with libc-2.15.so libgomp.so.1.0.0 libstdc++.so.6.0.16 from above with LD_PRELOAD
test here against MongoDB v2.4.3, crash in complex application, no crash in simple application, both share same MongoDB related code

All other libraries are SDK (namely boost 1.48.0).

Yes, we use C++11 all over the place, makes life a lot easier.

The issues you mentioned are mostly realated to thread destruction, right? As far as I see, here it is more realated to connection, when starting. The application in question is aimed to run forever. Indeed, it uses a lot of threading, and the driver is called from within a boost thread.

The last (and only) log from the driver is like:

Fri Aug 2 17:09:38.901 starting new replica set monitor for replica set rs-name with seed of host1:1001,host2:1001,host3:1001

Hope that may help.

Cheers,
Ingo

Comment by Tad Marshall [ 04/Aug/13 ]

Hi Ingo,

Thanks for the report.

Are you using version 2.4.1 of the C++ driver with version 2.4.3 of the MongoDB server as you stated? What is the reason for using two different versions? This is probably unrelated to the issue you're hitting, but to reproduce your problem we should probably use the exact versions that you are using.

Why not use the latest 2.4.5 version?

You are using "auto" in your sample code, which suggests C++11. We have ongoing work to bring the codebase to perfect compatibility with C++11, but we currently do not use features of C++11 and we do not build in C++11 mode for our released versions. What compiler version are you using?

gcc --version

What is the command line that you are using to build your program (including all options you are specifying)?

Can you give us the precise OS version you are using to test this code?

uname -a

lsb_release -a

We have had some issues related to using replica sets from programs using the C++ driver: SERVER-8707, SERVER-10372 and SERVER-8891. Some of the issues are better in the latest master branch code. Can you test your code with the latest master branch code and see if the behavior is better?

Thanks!

Tad

Generated at Thu Feb 08 03:23:08 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.