-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
None
-
Affects Version/s: 2.0.7
-
Component/s: Sharding
-
Environment:CentOS 6.3, MongoDB v2.0.7 release. Note this may affect v2.2 as well.
-
ALL
In MongoS, ReplicaSetMonitor::_checkConnection() acquires _checkConnectionLock, then calls _checkConnection(). _checkConnection() then calls _checkStatus(), which issues a blocking request to other nodes for replSetGetStatus.
This causes all commands sent to mongos to hang, apparently due to a combination WriteBackCommand::run() on mongod and WriteBackListener::run() on mongos.
Exact steps to reproduce are unclear, however this was encountered after some combination of the following steps:
- Removing a node from a replica set (via rs.reconfig())
- Hiding a node (via rs.reconfig())
- Unhiding a node (via rs.reconfig())
Note the following stack traces while mongos was unable to process any command:
mongod WriteBackCommand, waiting to pop from blocking queue:
Thread 25 (Thread 0x7f6430558700 (LWP 28047)): #0 0x00000032b720b7bb in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x0000000000a39f99 in mongo::WriteBackCommand::run(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj&, int, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, mongo::BSONObjBuilder&, bool) () #2 0x000000000097d994 in mongo::execCommand(mongo::Command*, mongo::Client&, int, char const*, mongo::BSONObj&, mongo::BSONObjBuilder&, bool) () #3 0x000000000097ef8f in mongo::_runCommands(char const*, mongo::BSONObj&, mongo::_BufBuilder<mongo::TrivialAllocator>&, mongo::BSONObjBuilder&, bool, int) () #4 0x00000000009420c5 in mongo::runCommands(char const*, mongo::BSONObj&, mongo::CurOp&, mongo::_BufBuilder<mongo::TrivialAllocator>&, mongo::BSONObjBuilder&, bool, int) () #5 0x0000000000944bf0 in mongo::runQuery(mongo::Message&, mongo::QueryMessage&, mongo::CurOp&, mongo::Message&) () #6 0x0000000000888fd7 in ?? () #7 0x000000000088dbb9 in mongo::assembleResponse(mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&) () #8 0x0000000000aa0b38 in mongo::MyMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*, mongo::LastError*) () #9 0x0000000000638767 in mongo::pms::threadRun(mongo::MessagingPort*) () #10 0x00000032b7207851 in start_thread () from /lib64/libpthread.so.0 #11 0x00000032b6ee811d in clone () from /lib64/libc.so.6
mongos WriteBackListener::run(), which has acquired _checkConnectionLock:
Thread 18 (Thread 0x7f4b83d0b700 (LWP 27781)): #0 0x00000032b720e94c in recv () from /lib64/libpthread.so.0 #1 0x0000000000550803 in mongo::Socket::_recv(char*, int) () #2 0x0000000000550819 in mongo::Socket::unsafe_recv(char*, int) () #3 0x0000000000551cf4 in mongo::Socket::recv(char*, int) () #4 0x0000000000558db6 in mongo::MessagingPort::recv(mongo::Message&) () #5 0x000000000055961b in mongo::MessagingPort::recv(mongo::Message const&, mongo::Message&) () #6 0x0000000000559aa4 in mongo::MessagingPort::call(mongo::Message&, mongo::Message&) () #7 0x000000000057898c in mongo::DBClientConnection::call(mongo::Message&, mongo::Message&, bool, std::basic_string<char, std::char_traits<char>, std::allocator<char> >*) () #8 0x00000000005945fd in mongo::DBClientCursor::init() () #9 0x0000000000567edc in mongo::DBClientBase::query(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::Query, int, int, mongo::BSONObj const*, int, int) () #10 0x000000000057e781 in mongo::DBClientConnection::query(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::Query, int, int, mongo::BSONObj const*, int, int) () #11 0x00000000005760c3 in mongo::DBClientInterface::findN(std::vector<mongo::BSONObj, std::allocator<mongo::BSONObj> >&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::Query, int, int, mongo::BSONObj const*, int) () #12 0x0000000000576c12 in mongo::DBClientInterface::findOne(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::Query const&, mongo::BSONObj const*, int) () #13 0x000000000057eafa in mongo::DBClientConnection::runCommand(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj const&, mongo::BSONObj&, int) () #14 0x0000000000584e7d in mongo::ReplicaSetMonitor::_checkStatus(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () #15 0x0000000000586560 in mongo::ReplicaSetMonitor::_checkConnection(mongo::DBClientConnection*, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, bool, int) () #16 0x00000000005875ce in mongo::ReplicaSetMonitor::_check(bool) () #17 0x00000000005881fe in mongo::ReplicaSetMonitor::getMaster() () #18 0x00000000005884bf in mongo::DBClientReplicaSet::checkMaster() () #19 0x000000000058b1d6 in mongo::DBClientReplicaSet::connect() () #20 0x00000000005790b9 in mongo::ConnectionString::connect(std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, double) const () #21 0x0000000000562972 in mongo::DBConnectionPool::get(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, double) () #22 0x00000000005c3e6c in mongo::ShardConnection::_init() () #23 0x00000000005c43a5 in mongo::ShardConnection::ShardConnection(mongo::Shard const&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () #24 0x0000000000768367 in mongo::Strategy::insert(mongo::Shard const&, char const*, mongo::BSONObj const&, int, bool) () #25 0x000000000076ba74 in mongo::ShardStrategy::_insert(mongo::Request&, mongo::DbMessage&, boost::shared_ptr<mongo::ChunkManager const>) () #26 0x0000000000773514 in mongo::ShardStrategy::writeOp(int, mongo::Request&) () #27 0x00000000007b4b7d in mongo::Request::process(int) () #28 0x00000000007ece27 in mongo::WriteBackListener::run() () #29 0x0000000000524e4f in mongo::BackgroundJob::jobBody(boost::shared_ptr<mongo::BackgroundJob::JobStatus>) () #30 0x00000000005271c4 in boost::detail::thread_data<boost::_bi::bind_t<void, boost::_mfi::mf1<void, mongo::BackgroundJob, boost::shared_ptr<mongo::BackgroundJob::JobStatus> >, boost::_bi::list2<boost::_bi::value<mongo::BackgroundJob*>, boost::_bi::value<boost::shared_ptr<mongo::BackgroundJob::JobStatus> > > > >::run() () #31 0x00000000008053f0 in thread_proxy () #32 0x00000032b7207851 in start_thread () from /lib64/libpthread.so.0 #33 0x00000032b6ee811d in clone () from /lib64/libc.so.6
mongos balancer thread, waiting to acquire _checkConnectionLock (this is just one of many threads waiting for this lock):
Thread 23 (Thread 0x7f4b87011700 (LWP 27676)): #0 0x00000032b720e054 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00000032b7209388 in _L_lock_854 () from /lib64/libpthread.so.0 #2 0x00000032b7209257 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x000000000058d013 in mongo::mutex::scoped_lock::scoped_lock(mongo::mutex&) () #4 0x0000000000585cf5 in mongo::ReplicaSetMonitor::_checkConnection(mongo::DBClientConnection*, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, bool, int) () #5 0x00000000005875ce in mongo::ReplicaSetMonitor::_check(bool) () #6 0x00000000005881fe in mongo::ReplicaSetMonitor::getMaster() () #7 0x00000000005884bf in mongo::DBClientReplicaSet::checkMaster() () #8 0x000000000058b1d6 in mongo::DBClientReplicaSet::connect() () #9 0x00000000005790b9 in mongo::ConnectionString::connect(std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, double) const () #10 0x0000000000562972 in mongo::DBConnectionPool::get(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, double) () #11 0x0000000000562f62 in mongo::ScopedDbConnection::ScopedDbConnection(mongo::Shard const*, double) () #12 0x0000000000756b15 in mongo::Shard::runCommand(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj const&) const () #13 0x00000000007d9444 in mongo::Balancer::_checkOIDs() () #14 0x00000000007dae61 in mongo::Balancer::run() () #15 0x0000000000524e4f in mongo::BackgroundJob::jobBody(boost::shared_ptr<mongo::BackgroundJob::JobStatus>) () #16 0x00000000005271c4 in boost::detail::thread_data<boost::_bi::bind_t<void, boost::_mfi::mf1<void, mongo::BackgroundJob, boost::shared_ptr<mongo::BackgroundJob::JobStatus> >, boost::_bi::list2<boost::_bi::value<mongo::BackgroundJob*>, boost::_bi::value<boost::shared_ptr<mongo::BackgroundJob::JobStatus> > > > >::run() () #17 0x00000000008053f0 in thread_proxy () #18 0x00000032b7207851 in start_thread () from /lib64/libpthread.so.0 #19 0x00000032b6ee811d in clone () from /lib64/libc.so.6
- related to
-
SERVER-7278 mongos doesn't always update shards in response to replica set changes
- Closed