Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Major - P3
Fix Version/s: 3.1.7
Affects Version/s: 3.1.7
Component/s: Sharding
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Sprint:
Sharding 8 08/28/15
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

The important stack trace from the hang analyzer is below. The thing to notice is the reentrancy to the catalog manager. Inside a catalog manager call, ShardConnection goes to refresh sharding metadata via the forwarding catalog manager. If the process detects that it needs to change the catalog manager in the inner operation, it fails to drop the lock on the outer operation, and so waits forever for the catalog manager to get changed out.

  mongo::ForwardingCatalogManager::waitForCatalogManagerChange() ()
  mongo::ForwardingCatalogManager::getAllShards(std::vector<mongo::ShardType, std::allocator<mongo::ShardType> >*) ()
  mongo::ShardRegistry::reload() ()
  mongo::ShardRegistry::getShard(std::string const&)
 
  mongo::(anonymous namespace)::checkShardVersion(mongo::OperationContext*, mongo::DBClientBase*, std::string const&, std::shared_ptr<mongo::ChunkManager>, bool, int) ()
  mongo::VersionManager::checkShardVersionCB(mongo::OperationContext*, mongo::ShardConnection*, bool, int) ()
  mongo::ShardConnection::_finishInit() ()
  mongo::ShardConnection::get() ()
  mongo::DBClientMultiCommand::sendAll() ()
  mongo::ConfigCoordinator::executeBatch(mongo::BatchedCommandRequest const&, mongo::BatchedCommandResponse*) ()
  mongo::CatalogManagerLegacy::writeConfigServerDirect(mongo::BatchedCommandRequest const&, mongo::BatchedCommandResponse*) ()
  mongo::ForwardingCatalogManager::writeConfigServerDirect(mongo::BatchedCommandRequest const&, mongo::BatchedCommandResponse*) ()
  mongo::CatalogManager::update(std::string const&, mongo::BSONObj const&, mongo::BSONObj const&, bool, bool, mongo::BatchedCommandResponse*) ()
  mongo::Balancer::_ping(mongo::OperationContext*, bool) ()
  mongo::Balancer::run() ()                                                            
  mongo::BackgroundJob::jobBody() ()

Assignee:: Kaloian Manassiev
Reporter:: Andy Schwerin
Participants:: Andy Schwerin, Githook User, Kaloian Manassiev
Votes:: 0 Vote for this issue
Watchers:: 47 Start watching this issue

Created:: Aug 20 2015 02:30:54 AM UTC
Updated:: Sep 19 2015 12:09:54 AM UTC
Resolved:: Aug 20 2015 05:25:58 PM UTC
Confidence Status Last Update:: 20/Aug/15 3:07 PM

Details

Description

Attachments

Activity

People

Dates