[SERVER-22862] Deadlock between ReplicaSetMonitor updating the connection string for a shard and reloading the ShardRegistry Created: 25/Feb/16  Updated: 25/Jan/17  Resolved: 26/Feb/16

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 3.2.4, 3.3.3

Type: Bug Priority: Major - P3
Reporter: Spencer Brody (Inactive) Assignee: Spencer Brody (Inactive)
Resolution: Done Votes: 0
Labels: code-and-test
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
is related to SERVER-22485 ShardNotFound error when looking up r... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Completed:
Sprint: Sharding 11 (03/11/16)
Participants:
Linked BF Score: 0

 Description   

 [2016/02/24 15:16:18.727] Thread 10 (Thread 0x2b4968191940 (LWP 15297)):
 [2016/02/24 15:16:18.727] #0  0x00002b495f481654 in __lll_lock_wait () from /lib64/libpthread.so.0
 [2016/02/24 15:16:18.727] #1  0x00002b495f47cf4a in _L_lock_1034 () from /lib64/libpthread.so.0
 [2016/02/24 15:16:18.727] #2  0x00002b495f47ce0c in pthread_mutex_lock () from /lib64/libpthread.so.0
 [2016/02/24 15:16:18.727] #3  0x0000000000701dd8 in mongo::ReplicaSetMonitor::getServerAddress() const
 [2016/02/24 15:16:18.727]     ()
 [2016/02/24 15:16:18.727] #4  0x00000000006ffc66 in mongo::RemoteCommandTargeterRS::connectionString()
 [2016/02/24 15:16:18.727]     ()
 [2016/02/24 15:16:18.727] #5  0x0000000000af3e57 in mongo::ShardRegistry::_updateLookupMapsForShard_inlock(std::shared_ptr<mongo::Shard>, boost::optional<mongo::ConnectionString const&>) ()
 [2016/02/24 15:16:18.727] #6  0x0000000000af4770 in mongo::ShardRegistry::_addShard_inlock(mongo::ShardType const&, bool) ()
 [2016/02/24 15:16:18.727] #7  0x0000000000af55ad in mongo::ShardRegistry::reload(mongo::OperationContext*) ()
 [2016/02/24 15:16:18.728] #8  0x0000000000a27dd6 in mongo::Balancer::run() ()
 [2016/02/24 15:16:18.728] #9  0x0000000000bc4d80 in mongo::BackgroundJob::jobBody() ()
 [2016/02/24 15:16:18.728] #10 0x0000000000e15190 in execute_native_thread_routine ()
 [2016/02/24 15:16:18.728] #11 0x00002b495f47a83d in start_thread () from /lib64/libpthread.so.0
 [2016/02/24 15:16:18.728] #12 0x00002b495f764fdd in clone () from /lib64/libc.so.6
 [2016/02/24 15:16:18.728] #13 0x0000000000000000 in ?? ()

 [2016/02/24 15:16:18.731] Thread 5 (Thread 0x2b496a995940 (LWP 15448)):
 [2016/02/24 15:16:18.731] #0  0x00002b495f481654 in __lll_lock_wait () from /lib64/libpthread.so.0
 [2016/02/24 15:16:18.731] #1  0x00002b495f47cf4a in _L_lock_1034 () from /lib64/libpthread.so.0
 [2016/02/24 15:16:18.732] #2  0x00002b495f47ce0c in pthread_mutex_lock () from /lib64/libpthread.so.0
 [2016/02/24 15:16:18.732] #3  0x0000000000af3152 in mongo::ShardRegistry::lookupRSName(std::string const&) const ()
 [2016/02/24 15:16:18.732] #4  0x0000000000b5061f in mongo::ConfigServer::replicaSetChangeShardRegistryUpdateHook(std::string const&, std::string const&) ()
 [2016/02/24 15:16:18.732] #5  0x0000000000707884 in mongo::ReplicaSetMonitor::Refresher::receivedIsMasterFromMaster(mongo::ReplicaSetMonitor::IsMasterReply const&) ()
 [2016/02/24 15:16:18.732] #6  0x0000000000708313 in mongo::ReplicaSetMonitor::Refresher::receivedIsMaster(mongo::HostAndPort const&, long, mongo::BSONObj const&) ()
 [2016/02/24 15:16:18.732] #7  0x000000000070890d in mongo::ReplicaSetMonitor::Refresher::_refreshUntilMatches(mongo::ReadPreferenceSetting const*) ()
 [2016/02/24 15:16:18.732] #8  0x00000000007096c1 in mongo::(anonymous namespace)::ReplicaSetMonitorWatcher::run() ()
 [2016/02/24 15:16:18.732] #9  0x0000000000bc4d80 in mongo::BackgroundJob::jobBody() ()
 [2016/02/24 15:16:18.732] #10 0x0000000000e15190 in execute_native_thread_routine ()
 [2016/02/24 15:16:18.732] #11 0x00002b495f47a83d in start_thread () from /lib64/libpthread.so.0
 [2016/02/24 15:16:18.732] #12 0x00002b495f764fdd in clone () from /lib64/libc.so.6
 [2016/02/24 15:16:18.732] #13 0x0000000000000000 in ?? ()



 Comments   
Comment by Githook User [ 29/Feb/16 ]

Author:

{u'username': u'stbrody', u'name': u'Spencer T Brody', u'email': u'spencer@mongodb.com'}

Message: SERVER-22862 Fix deadlock between ShardRegistry and ReplicaSetMonitor

(cherry picked from commit bbfd2e1c9884e13aa0203cc1887d956fbb486c6b)
Branch: v3.2
https://github.com/mongodb/mongo/commit/44aa73fcb3cc5043e79b7849089eeb8ffcb4366d

Comment by Githook User [ 26/Feb/16 ]

Author:

{u'username': u'stbrody', u'name': u'Spencer T Brody', u'email': u'spencer@mongodb.com'}

Message: SERVER-22862 Fix deadlock between ShardRegistry and ReplicaSetMonitor
Branch: master
https://github.com/mongodb/mongo/commit/bbfd2e1c9884e13aa0203cc1887d956fbb486c6b

Generated at Thu Feb 08 04:01:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.