[SERVER-20058] mongos deadlock while replacing catalog manager Created: 20/Aug/15 Updated: 19/Sep/15 Resolved: 20/Aug/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 3.1.7 |
| Fix Version/s: | 3.1.7 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Andy Schwerin | Assignee: | Kaloian Manassiev |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Backwards Compatibility: | Fully Compatible |
| Operating System: | ALL |
| Sprint: | Sharding 8 08/28/15 |
| Participants: |
| Description |
|
The important stack trace from the hang analyzer is below. The thing to notice is the reentrancy to the catalog manager. Inside a catalog manager call, ShardConnection goes to refresh sharding metadata via the forwarding catalog manager. If the process detects that it needs to change the catalog manager in the inner operation, it fails to drop the lock on the outer operation, and so waits forever for the catalog manager to get changed out.
|
| Comments |
| Comment by Githook User [ 20/Aug/15 ] |
|
Author: {u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}Message: Adds an argument to DBClientMultiCommand so it doesn't do SetShardVersion |
| Comment by Kaloian Manassiev [ 20/Aug/15 ] |
|
This is because of the fix for DBClientMultiCommand should go through ShardConnection at least for the shards, because otherwise there are cases where they are not sharding aware. However, for the config server it should be fine to create DBClientConnections. |
| Comment by Andy Schwerin [ 20/Aug/15 ] |
|
Indeed, the error appears to be that DBClientMultiCommand::sendAll() is creating ShardConnections for connections of type MASTER when dispatching commands to the three config servers. VersionManager::isVersionableCB sees that the connections are of type MASTER, and decides this must mean they're to standalone shards, rather than a config server, and so treats the connections as versionable. I suspect the error is that CatalogManagerLegacy should not be using DBClientMultiCommand to execute config server writes. kaloian.manassiev, are there other reasonable options? |
| Comment by Andy Schwerin [ 20/Aug/15 ] |
|
It's strange that we're using a ShardConnection (that checks shard version information) to do operations on the config server. |