[SERVER-25029] Segmentation fault in mongos when config servers not available Created: 13/Jul/16  Updated: 18/Dec/17  Resolved: 22/Jul/16

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.2.8
Fix Version/s: 3.2.9

Type: Bug Priority: Major - P3
Reporter: Aristarkh Zagorodnikov Assignee: Misha Tyulenev
Resolution: Done Votes: 0
Labels: code-and-test
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu 14.04 x64 on Hyper-V 2012R2 w/Docker 1.12


Attachments: Text File mongos-0.log     Text File mongos-1.log    
Issue Links:
Duplicate
is duplicated by SERVER-25161 MongoDB sharding data node crashed wh... Closed
is duplicated by SERVER-25317 [mongosMain] Invalid access at addres... Closed
Related
is related to SERVER-7553 mongos crashes when configdb is not l... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

Create an CSRS and start mongos:
/usr/bin/mongos --configdb config/config-0:27017,config-1:27017,config-2:27017

Sprint: Sharding 18 (08/05/16)
Participants:

 Description   

Mongos crashes on startup occasionally (3 out of 10 times so far).
Logs are attached.



 Comments   
Comment by Githook User [ 22/Jul/16 ]

Author:

{u'username': u'mikety', u'name': u'Misha Tyulenev', u'email': u'misha@mongodb.com'}

Message: SERVER-25029 fix segfault in mongos when config servers are not available
Branch: v3.2
https://github.com/mongodb/mongo/commit/6db34b27828b6ed48bf2002120afc5a4078789a0

Comment by Misha Tyulenev [ 21/Jul/16 ]

Pushed the testcase to master, but the fix is applicable only to v3.2

Comment by Githook User [ 21/Jul/16 ]

Author:

{u'username': u'mikety', u'name': u'Misha Tyulenev', u'email': u'misha@mongodb.com'}

Message: SERVER-25029 fix segfault in mongos when config servers are not available
Branch: master
https://github.com/mongodb/mongo/commit/2e1425f66f7965c6de51519f33ac08bbfa229787

Comment by Misha Tyulenev [ 20/Jul/16 ]

The code that was reading isMaster response for CSRS is not in the master branch: it was gone with thew removal of SCCC. Hence the issue exists only in 3.2.

Comment by Andy Schwerin [ 20/Jul/16 ]

Also, misha, your diagnosis appears to be correct. What I don't know right now is why the crash isn't happening on master.

Comment by Andy Schwerin [ 20/Jul/16 ]

misha.tyulenev, perhaps you were trying to reproduce this on the master branch, where it does not appear. I can definitely reproduce this on the 3.2 branch using onyxmaster's described reproduction. Actually, when you launch the mongos, you can just pass --configdb config/localhost:10001.

Comment by Aristarkh Zagorodnikov [ 20/Jul/16 ]

I wonder what's the problem with reproduction, it's 100% reproducible, in virtual, physical or containerized environments, at least on Ubuntu 14.04 and 16.04, you don't even need any real servers with data (in fact even one server is enough).

Preparation:

rm -rf /tmp/c1 ; mkdir /tmp/c1 ; /usr/bin/mongod --port 10001 --dbpath /tmp/c1 --configsvr --replSet config &

Repro:

/usr/bin/mongos --configdb config/localhost:10001,localhost:10002,localhost:10003

Comment by Misha Tyulenev [ 19/Jul/16 ]

I could not reproduce but by analyzing the code I think found the issue. When mongos tries to connect to config servers it expects that the response to the isMaster will have "setName" field that is only specific for initiated replica sets and if CSRS is still initiating it will cause a segfault.

Comment by Ramon Fernandez Marina [ 13/Jul/16 ]

I'm using mtools:

mlaunch init --configsvr --replicaset --nodes 3 --port 30000; sleep 1 ; mongos --configdb config/localhost:30000,localhost:30001,localhost:30002

mongos indeed tries to connect before the replica set is ready.

Comment by Andy Schwerin [ 13/Jul/16 ]

ramon.fernandez, could you attach instructions for your reproduction, or better yet, a script?

Comment by Kaloian Manassiev [ 13/Jul/16 ]

It is very likely that this might have been introduced by this commit: https://github.com/mongodb/mongo/commit/ab8867291f5da498ab96fbc850db51d573bd0c2b

On this line we try to construct a ConnectionString with the response of isMaster. Reading the comment for valueStringData, it says that it does not check the type:

    /**
     * Returns a StringData pointing into this element's data.  Does not validate that the
     * element is actually of type String.
     */
    const StringData valueStringData() const {
        return StringData(valuestr(), valuestrsize() - 1);
    }

So the hypothesis is that we somehow got an empty setName due to the CSRS nodes still starting up and tried to cast that empty BSON to string.

Comment by Ramon Fernandez Marina [ 13/Jul/16 ]

Here's the parsed stack trace:

?? ??:0
mongo::printStackTrace(std::ostream&) at /data/mci/bbd2297f0f9f6fbec1838fe1f9e1cb92/src/src/mongo/util/stacktrace_posix.cpp:172
mongo::(anonymous namespace)::printSignalAndBacktrace(int) at /data/mci/bbd2297f0f9f6fbec1838fe1f9e1cb92/src/src/mongo/util/signal_handlers_synchronous.cpp:182
mongo::(anonymous namespace)::abruptQuitWithAddrSignal(int, siginfo_t*, void*) at /data/mci/bbd2297f0f9f6fbec1838fe1f9e1cb92/src/src/mongo/util/signal_handlers_synchronous.cpp:277
?? ??:0
?? ??:0
std::char_traits<char>::copy(char*, char const*, unsigned long) at /opt/mongodbtoolchain/v1/include/c++/4.8.2/bits/char_traits.h:271
_Alloc_hider at /data/mci/8b24fd5eb2023a18a85d7c8d7c44dac1/toolchain-builder/build-gcc-v1.sh-Bi5/x86_64-mongodb-linux/libstdc++-v3/include/bits/basic_string.h:275
mongo::ConnectionString::ConnectionString(mongo::StringData, std::vector<mongo::HostAndPort, std::allocator<mongo::HostAndPort> >) at /data/mci/bbd2297f0f9f6fbec1838fe1f9e1cb92/src/src/mongo/client/connection_string.cpp:47
~vector at /opt/mongodbtoolchain/v1/include/c++/4.8.2/bits/stl_vector.h:416
operator= at /data/mci/bbd2297f0f9f6fbec1838fe1f9e1cb92/src/src/mongo/client/connection_string.h:61
std::_List_iterator<mongo::DBConnectionHook*>::operator++(int) at /opt/mongodbtoolchain/v1/include/c++/4.8.2/bits/stl_list.h:163
mongo::DBConnectionPool::_finishCreate(std::string const&, double, mongo::DBClientBase*) at /data/mci/bbd2297f0f9f6fbec1838fe1f9e1cb92/src/src/mongo/client/connpool.cpp:216
std::string::_M_data() const at /opt/mongodbtoolchain/v1/include/c++/4.8.2/bits/basic_string.h:293
mongo::ScopedDbConnection::ScopedDbConnection(mongo::ConnectionString const&, double) at /data/mci/bbd2297f0f9f6fbec1838fe1f9e1cb92/src/src/mongo/client/connpool.cpp:461
~basic_string at /opt/mongodbtoolchain/v1/include/c++/4.8.2/bits/basic_string.h:539
mongo::ReplicaSetMonitor::getHostOrRefresh(mongo::ReadPreferenceSetting const&, std::chrono::duration<long, std::ratio<1l, 1000l> >) at /data/mci/bbd2297f0f9f6fbec1838fe1f9e1cb92/src/src/mongo/client/replica_set_monitor.cpp:291
mongo::RemoteCommandTargeterRS::findHost(mongo::ReadPreferenceSetting const&, std::chrono::duration<long, std::ratio<1l, 1000l> >) at /data/mci/bbd2297f0f9f6fbec1838fe1f9e1cb92/src/src/mongo/client/remote_command_targeter_rs.cpp:64
mongo::Status::code() const at /data/mci/bbd2297f0f9f6fbec1838fe1f9e1cb92/src/src/mongo/base/status-inl.h:72
mongo::Status::code() const at /data/mci/bbd2297f0f9f6fbec1838fe1f9e1cb92/src/src/mongo/base/status-inl.h:72
mongo::Status::code() const at /data/mci/bbd2297f0f9f6fbec1838fe1f9e1cb92/src/src/mongo/base/status-inl.h:72
~basic_string at /opt/mongodbtoolchain/v1/include/c++/4.8.2/bits/basic_string.h:539
mongo::Status::code() const at /data/mci/bbd2297f0f9f6fbec1838fe1f9e1cb92/src/src/mongo/base/status-inl.h:72
mongo::Status::code() const at /data/mci/bbd2297f0f9f6fbec1838fe1f9e1cb92/src/src/mongo/base/status-inl.h:72
mongo::initializeGlobalShardingState(mongo::OperationContext*, mongo::ConnectionString const&, bool) at /data/mci/bbd2297f0f9f6fbec1838fe1f9e1cb92/src/src/mongo/s/sharding_initialization.cpp:191
mongo::Status::code() const at /data/mci/bbd2297f0f9f6fbec1838fe1f9e1cb92/src/src/mongo/base/status-inl.h:72
main at /data/mci/bbd2297f0f9f6fbec1838fe1f9e1cb92/src/src/mongo/s/server.cpp:495
?? ??:0
_start at ??:?

Comment by Ramon Fernandez Marina [ 13/Jul/16 ]

Thanks for your report onyxmaster, we're investigating.

EDIT: I can reproduce this easily; it is indeed the case that it happens when the config servers are not yet available when mongos starts.

Comment by Aristarkh Zagorodnikov [ 13/Jul/16 ]

It appears to fail when the CSRS is not available yet at startup time.

Generated at Thu Feb 08 04:08:05 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.