[SERVER-4581] Mongos and mongod keep trying to reconnect to removed shard Created: 29/Dec/11  Updated: 11/Jul/16  Resolved: 15/Jun/12

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.0.2
Fix Version/s: 2.1.2

Type: Bug Priority: Major - P3
Reporter: Spencer Brody (Inactive) Assignee: Spencer Brody (Inactive)
Resolution: Done Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Operating System: ALL
Participants:

 Description   

When you remove a shard, it seems that the the metadata associated with the removed shard doesn't get cleaned up, so if that shard is taken offline, the mongoses and the other shards will keep trying to reconnect.



 Comments   
Comment by auto [ 15/Jun/12 ]

Author:

{u'date': u'2012-06-12T08:25:46-07:00', u'email': u'spencer@10gen.com', u'name': u'Spencer T Brody'}

Message: Remove connections to removed shard from connection pools. SERVER-4581
Branch: master
https://github.com/mongodb/mongo/commit/93c8997b3df1492a90ec41b97ec3193c49f460d5

Comment by auto [ 12/Jun/12 ]

Author:

{u'date': u'2012-03-26T11:31:33-07:00', u'email': u'spencer@10gen.com', u'name': u'Spencer T Brody'}

Message: Clean up ReplicaSetMonitor when the whole set has been down for a long time. SERVER-4581
Branch: master
https://github.com/mongodb/mongo/commit/9ab21eeb9443c41455a18f3ff7016166a16a6425

Comment by Spencer Brody (Inactive) [ 27/Mar/12 ]

This is also a problem for the mongos.

Comment by Eliot Horowitz (Inactive) [ 30/Dec/11 ]

The 2nd is definitely better.
Then can just cycle through write back listener threads and kill any that aren't needed anymore.

Comment by Spencer Brody (Inactive) [ 30/Dec/11 ]

I see two possible approaches to solving this. Currently the removeShard command doesn't ever get sent to the actual shards, only to the mongos. One option is to have the removeShard command on the mongos call a removeShard command on each shard to tell it to remove its metadata associated with the removed shard. The other option is to have the shard reload the sharding info after some number of failed attempts to reconnect to a down shard. I'm leaning towards the latter right now, as is probably a good idea to have some kind of behavior like this anyway.

Generated at Thu Feb 08 03:06:24 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.