[SERVER-4891] mongos crash on signal 11 Created: 07/Feb/12  Updated: 30/Mar/12  Resolved: 12/Feb/12

Status: Closed
Project: Core Server
Component/s: Stability
Affects Version/s: 2.0.2
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Anton Batenev Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: crash, mongos, replicaset, sharding
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

debian-squeeze (amd64), 2 replica sets (primary/secondary/arbiter) in sharding, 3 config server.


Operating System: Linux
Participants:

 Description   

Sometimes mongos crashed on signal 11.

  1. mongos --version
    Tue Feb 7 07:57:44 mongos db version v2.0.2, pdfile version 4.5 starting (--help for usage)
    Tue Feb 7 07:57:44 git version: 514b122d308928517f5841888ceaa4246a7f18e3
    Tue Feb 7 07:57:44 build info: Linux bs-linux64.10gen.cc 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_41

Backtraces below:

— 1 —
Core was generated by `/usr/bin/mongos --ipv6 --configdb x.x.x.x:27018,y.y.y.y:27018,z.z.z.z:27018'.
Program terminated with signal 11, Segmentation fault.
#0 0x00007f4eac594d37 in ?? () from /lib/libgcc_s.so.1
(gdb) bt
#0 0x00007f4eac594d37 in ?? () from /lib/libgcc_s.so.1
#1 0x00007f4eac5955be in _Unwind_Backtrace () from /lib/libgcc_s.so.1
#2 0x00007f4eac3070ee in backtrace () from /lib/libc.so.6
#3 0x000000000054de95 in formattedBacktrace (signalNum=11) at util/signal_handlers.cpp:93
#4 mongo::printStackAndExit (signalNum=11) at util/signal_handlers.cpp:115
#5 <signal handler called>
#6 0x00000000000000e0 in ?? ()
#7 0x0000000001b10848 in ?? ()
#8 0x0000000001c15f70 in ?? ()
#9 0x0000000001c15f70 in ?? ()
#10 0x00007f4eac29994c in free () from /lib/libc.so.6
#11 0x00000000005800b2 in checked_delete<mongo::DBClientConnection> (this=0x1bcd930)
at /opt/extra/include/boost/checked_delete.hpp:34
#12 ~scoped_ptr (this=0x1bcd930) at /opt/extra/include/boost/smart_ptr/scoped_ptr.hpp:80
#13 boost::scoped_ptr<mongo::DBClientConnection>::reset (this=0x1bcd930)
at /opt/extra/include/boost/smart_ptr/scoped_ptr.hpp:86
#14 mongo::DBClientReplicaSet::isntMaster (this=0x1bcd930) at client/dbclient_rs.cpp:674
#15 0x000000000057e5c5 in mongo::DBClientConnection::runCommand (this=0x1bcd998, dbname=<value optimized out>,
cmd=<value optimized out>, info=<value optimized out>, options=-1543491552) at client/dbclient.cpp:613
#16 0x000000000074a654 in mongo::setShardVersion (conn=..., ns=..., version=..., authoritative=false, result=...)
at s/chunk.cpp:1037
#17 0x00000000007f3d2d in checkShardVersion (conn_in=<value optimized out>, ns=..., authoritative=false, tryNumber=1)
at s/shard_version.cpp:257
#18 0x00000000005c1e06 in boost::detail::function::function_invoker4<bool (mongo::DBClientBase&, std::string const&, bool, int), bool, mongo::DBClientBase&, std::string const&, bool, int>::invoke (function_ptr=..., a0=..., a1=..., a2=true, a3=12)
at /opt/extra/include/boost/function/function_template.hpp:95
#19 0x00000000005c284e in boost::function4<bool, mongo::DBClientBase&, std::string const&, bool, int>::operator() (
this=0x19ab9c0, ns=...) at /opt/extra/include/boost/function/function_template.hpp:1013
#20 mongo::ClientConnections::checkVersions (this=0x19ab9c0, ns=...) at s/shardconnection.cpp:155
#21 0x00000000005c03b0 in mongo::ClientConnections::_check (this=0x7f4e967425a0) at s/shardconnection.cpp:168
#22 mongo::ClientConnections::get (this=0x7f4e967425a0) at s/shardconnection.cpp:94
#23 mongo::ShardConnection::_init (this=0x7f4e967425a0) at s/shardconnection.cpp:207
#24 0x00000000005c0915 in ShardConnection (this=0x7f4e967425a0, s=<value optimized out>, ns=...) at s/shardconnection.cpp:197
#25 0x0000000000767487 in mongo::Strategy::insert (this=<value optimized out>, shard=...,
ns=0x1bccdc4 "db_name.collection_name", obj=..., flags=0, safe=false) at s/strategy.cpp:73
#26 0x000000000076f9da in mongo::ShardStrategy::_insert (this=0x19544c0, r=..., d=..., manager=...)
at s/strategy_shard.cpp:160
#27 0x0000000000776a83 in mongo::ShardStrategy::writeOp (this=0x19544c0, op=2002, r=...) at s/strategy_shard.cpp:492
#28 0x00000000007b487d in mongo::Request::process (this=0x7f4e96742b10, attempt=0) at s/request.cpp:151
#29 0x00000000007c6cf1 in mongo::ShardedMessageHandler::process (this=<value optimized out>, m=..., p=0x7f4e840357b0,
le=0x19ab970) at s/server.cpp:95
#30 0x00000000005e6a07 in mongo::pms::threadRun (inPort=0x7f4e840357b0) at util/net/message_server_port.cpp:74
#31 0x00007f4eacd378ca in start_thread () from /lib/libpthread.so.0
#32 0x00007f4eac2f286d in clone () from /lib/libc.so.6
#33 0x0000000000000000 in ?? ()
— /1 —

and

— 2 —
Program terminated with signal 11, Segmentation fault.
#0 0x00007f74b6e5fd37 in ?? () from /lib/libgcc_s.so.1
(gdb) bt
#0 0x00007f74b6e5fd37 in ?? () from /lib/libgcc_s.so.1
#1 0x00007f74b6e605be in _Unwind_Backtrace () from /lib/libgcc_s.so.1
#2 0x00007f74b6bd20ee in backtrace () from /lib/libc.so.6
#3 0x000000000054de95 in formattedBacktrace (signalNum=11) at util/signal_handlers.cpp:93
#4 mongo::printStackAndExit (signalNum=11) at util/signal_handlers.cpp:115
#5 <signal handler called>
#6 0x00000000000000e0 in ?? ()
#7 0x000000000169d0f8 in ?? ()
#8 0x00000000015e5830 in ?? ()
#9 0x00000000015e5830 in ?? ()
#10 0x00007f74b6b6494c in free () from /lib/libc.so.6
#11 0x00000000005800b2 in checked_delete<mongo::DBClientConnection> (this=0x15e5760)
at /opt/extra/include/boost/checked_delete.hpp:34
#12 ~scoped_ptr (this=0x15e5760) at /opt/extra/include/boost/smart_ptr/scoped_ptr.hpp:80
#13 boost::scoped_ptr<mongo::DBClientConnection>::reset (this=0x15e5760)
at /opt/extra/include/boost/smart_ptr/scoped_ptr.hpp:86
#14 mongo::DBClientReplicaSet::isntMaster (this=0x15e5760) at client/dbclient_rs.cpp:674
#15 0x000000000057e5c5 in mongo::DBClientConnection::runCommand (this=0x15e57c8, dbname=<value optimized out>,
cmd=<value optimized out>, info=<value optimized out>, options=-1342168224) at client/dbclient.cpp:613
#16 0x000000000074a654 in mongo::setShardVersion (conn=..., ns=..., version=..., authoritative=false, result=...)
at s/chunk.cpp:1037
#17 0x00000000007f3d2d in checkShardVersion (conn_in=<value optimized out>, ns=..., authoritative=false, tryNumber=1)
at s/shard_version.cpp:257
#18 0x00000000005c1e06 in boost::detail::function::function_invoker4<bool (mongo::DBClientBase&, std::string const&, bool, int), bool, mongo::DBClientBase&, std::string const&, bool, int>::invoke (function_ptr=..., a0=..., a1=..., a2=true, a3=12)
at /opt/extra/include/boost/function/function_template.hpp:95
#19 0x00000000005bffe7 in boost::function4<bool, mongo::DBClientBase&, std::string const&, bool, int>::operator() (
this=0x7f74a189dd80) at /opt/extra/include/boost/function/function_template.hpp:1013
#20 mongo::ShardConnection::_finishInit (this=0x7f74a189dd80) at s/shardconnection.cpp:217
#21 0x000000000079879b in mongo::dbgrid_pub_cmds::CountCmd::run (this=0xb15110, dbName=..., cmdObj=..., options=0,
errmsg=..., result=...) at s/shard.h:248
#22 0x0000000000793175 in mongo::Command::runAgainstRegistered (ns=<value optimized out>, jsobj=..., anObjBuilder=...,
queryOptions=0) at s/commands_public.cpp:1367
#23 0x000000000076cf2b in mongo::SingleStrategy::queryOp (this=0x14e9b60, r=...) at s/strategy_single.cpp:58
#24 0x00000000007b4927 in mongo::Request::process (this=0x7f74a189eb10, attempt=0) at s/request.cpp:132
#25 0x00000000007c6cf1 in mongo::ShardedMessageHandler::process (this=<value optimized out>, m=..., p=0x15e5a50,
le=0x15da620) at s/server.cpp:95
#26 0x00000000005e6a07 in mongo::pms::threadRun (inPort=0x15e5a50) at util/net/message_server_port.cpp:74
#27 0x00007f74b76028ca in start_thread () from /lib/libpthread.so.0
#28 0x00007f74b6bbd86d in clone () from /lib/libc.so.6
#29 0x0000000000000000 in ?? ()
— /2 —

I have both core dumps and can send it or upload on request (300-400MB per core).



 Comments   
Comment by Anton Batenev [ 13/Feb/12 ]

Not reproduced. Please, resolve as duplicate.

Comment by Eliot Horowitz (Inactive) [ 12/Feb/12 ]

should be a dupe of the one you listed
if problem persists - please let us know

Comment by Anton Batenev [ 07/Feb/12 ]

Now testing 2.0.3-rc0 for reproduce.

Comment by Anton Batenev [ 07/Feb/12 ]

However, at the time of the fail, there was no changing primary/secondary (both secondary was down).

Comment by Anton Batenev [ 07/Feb/12 ]

Probable related to SERVER-4699

Generated at Thu Feb 08 03:07:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.