[SERVER-4581] Mongos and mongod keep trying to reconnect to removed shard Created: 29/Dec/11 Updated: 11/Jul/16 Resolved: 15/Jun/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 2.0.2 |
| Fix Version/s: | 2.1.2 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Spencer Brody (Inactive) | Assignee: | Spencer Brody (Inactive) |
| Resolution: | Done | Votes: | 1 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Operating System: | ALL | ||||
| Participants: | |||||
| Description |
|
When you remove a shard, it seems that the the metadata associated with the removed shard doesn't get cleaned up, so if that shard is taken offline, the mongoses and the other shards will keep trying to reconnect. |
| Comments |
| Comment by auto [ 15/Jun/12 ] |
|
Author: {u'date': u'2012-06-12T08:25:46-07:00', u'email': u'spencer@10gen.com', u'name': u'Spencer T Brody'}Message: Remove connections to removed shard from connection pools. |
| Comment by auto [ 12/Jun/12 ] |
|
Author: {u'date': u'2012-03-26T11:31:33-07:00', u'email': u'spencer@10gen.com', u'name': u'Spencer T Brody'}Message: Clean up ReplicaSetMonitor when the whole set has been down for a long time. |
| Comment by Spencer Brody (Inactive) [ 27/Mar/12 ] |
|
This is also a problem for the mongos. |
| Comment by Eliot Horowitz (Inactive) [ 30/Dec/11 ] |
|
The 2nd is definitely better. |
| Comment by Spencer Brody (Inactive) [ 30/Dec/11 ] |
|
I see two possible approaches to solving this. Currently the removeShard command doesn't ever get sent to the actual shards, only to the mongos. One option is to have the removeShard command on the mongos call a removeShard command on each shard to tell it to remove its metadata associated with the removed shard. The other option is to have the shard reload the sharding info after some number of failed attempts to reconnect to a down shard. I'm leaning towards the latter right now, as is probably a good idea to have some kind of behavior like this anyway. |