[SERVER-6654] Determine if a shard hitting its connection limit causes mongos crash in ReplicaSetMonitor::_checkConnection() Created: 31/Jul/12  Updated: 15/Aug/12  Resolved: 31/Jul/12

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.0.6
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Ian Daniel Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

MongoDB version 2.0.6


Issue Links:
Duplicate
is duplicated by SERVER-5594 missed check of node index when initi... Closed
Related
Participants:

 Description   

The following mongos 2.0.6 stack trace appears to have occurred when a shard hit its connection limit:

Thu Jul 19 21:55:48 [Balancer] DBClientCursor::init call() failed
Received signal 11
Backtrace: 0x54e5b5 0x7f20b429d900 0x585f03 0x588061 0x5888bd 0x756723 0x756eab 0x75c402 0x7dbbd4 0x7dc15f 0x52503f 0x5273b4 0x8069e0 0x7f20b4da27f1 0x7f20b4350ccd
mongos(_ZN5mongo17printStackAndExitEi+0x75)[0x54e5b5]
/lib64/libc.so.6(+0x32900)[0x7f20b429d900]
mongos(_ZN5mongo17ReplicaSetMonitor16_checkConnectionEPNS_18DBClientConnectionERSsbi+0x1543)[0x585f03]
mongos(_ZN5mongo17ReplicaSetMonitorC1ERKSsRKSt6vectorINS_11HostAndPortESaIS4_EE+0x381)[0x588061]
mongos(_ZN5mongo17ReplicaSetMonitor3getERKSsRKSt6vectorINS_11HostAndPortESaIS4_EE+0x1fd)[0x5888bd]
mongos(_ZN5mongo5Shard7_rsInitEv+0x133)[0x756723]
mongos(_ZN5mongo5Shard8_setAddrERKSs+0x13b)[0x756eab]
mongos(_ZN5mongo15StaticShardInfo6reloadEv+0xab2)[0x75c402]
mongos(_ZN5mongo8Balancer5_initEv+0x44)[0x7dbbd4]
mongos(_ZN5mongo8Balancer3runEv+0x3f)[0x7dc15f]
mongos(_ZN5mongo13BackgroundJob7jobBodyEN5boost10shared_ptrINS0_9JobStatusEEE+0xbf)[0x52503f]
mongos(_ZN5boost6detail11thread_dataINS_3_bi6bind_tIvNS_4_mfi3mf1IvN5mongo13BackgroundJobENS_10shared_ptrINS7_9JobStatusEEEEENS2_5list2INS2_5valueIPS7_EENSD_ISA_EEEEEEE3runEv+0x74)[0x5273b4]
mongos(thread_proxy+0x80)[0x8069e0]
/lib64/libpthread.so.0(+0x77f1)[0x7f20b4da27f1]
/lib64/libc.so.6(clone+0x6d)[0x7f20b4350ccd]
===

The relevant code is in client/dbclient_rs.cpp. Determine if this is a bug, and if it has already been fixed for 2.2.



 Comments   
Comment by Randolph Tan [ 31/Jul/12 ]

addr2line output:

/mnt/home/buildbot/slave/Linux_64bit_V2.0/mongo/util/signal_handlers.cpp:94
??
??:0
mongo::ReplicaSetMonitor::_checkConnection(mongo::DBClientConnection*, std::string&, bool, int)
/mnt/home/buildbot/slave/Linux_64bit_V2.0/mongo/client/dbclient_rs.cpp:555
ReplicaSetMonitor
/mnt/home/buildbot/slave/Linux_64bit_V2.0/mongo/client/dbclient_rs.cpp:97
reset<mongo::ReplicaSetMonitor>
/opt/extra/include/boost/smart_ptr/shared_ptr.hpp:391
__gnu_cxx::new_allocator<mongo::HostAndPort>::deallocate(mongo::HostAndPort*, unsigned long)
/mnt/home/buildbot/slave/Linux_64bit_V2.0/mongo/s/shard.cpp:256
mongo::Shard::_setAddr(std::string const&)
/mnt/home/buildbot/slave/Linux_64bit_V2.0/mongo/s/shard.cpp:250
shared_ptr<mongo::Shard>
/opt/extra/include/boost/smart_ptr/shared_ptr.hpp:187
mongo::Balancer::_init()
/mnt/home/buildbot/slave/Linux_64bit_V2.0/mongo/s/balance.cpp:233
mongo::Balancer::run()
/mnt/home/buildbot/slave/Linux_64bit_V2.0/mongo/s/balance.cpp:259
boost::shared_ptr<mongo::BackgroundJob::JobStatus>::operator->() const
/opt/extra/include/boost/smart_ptr/shared_ptr.hpp:418
~shared_count
/opt/extra/include/boost/smart_ptr/detail/shared_count.hpp:217
thread_proxy
??:0
??
??:0
??
??:0

points to this line in ReplicaSetMonitor::_checkConnection:

if ( errorOccured && nodesOffset >= 0 ) {
    scoped_lock lk( _lock );
    _nodes[nodesOffset].ok = false; // <----- this line
}

This bug is fixed with SERVER-5594 and followed up by SERVER-6512.

Generated at Thu Feb 08 03:12:19 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.