[SERVER-14863] Mongos ReplicaSetMonitorWatcher continues to monitor drained/removed shard Created: 12/Aug/14 Updated: 06/Dec/22 Resolved: 25/Jul/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 2.4.10, 2.6.3, 2.7.5 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Victor Hooi | Assignee: | [DO NOT USE] Backlog - Sharding Team |
| Resolution: | Done | Votes: | 9 |
| Labels: | ShardingRoughEdges | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Assigned Teams: |
Sharding
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Steps To Reproduce: | Create a sharded cluster with multiple mongos processes connected to it. For example, using mlaunch:
Enable logging on a database/collection:
Insert some dummy data:
Increase the logging on each of your mongos processes to logging level 4:
Start the draining process:
After the chunks have finished draining, run it a second time to remove the shard:
Run flushRouterConfig on each mongos:
The mongos on which you performed the drain will only check for shard01:
However, the other mongos processes will still have a ReplicaSetMonitorWatcher checking for shard02:
After a reboot of the affected mongos, they no longer monitor the removed shard.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
We have a MongoDD sharded cluster with two shards. We have multiple mongos processes connected to this cluster. Through one of the mongos processes, we initiate a drain of, and removal of one of the shards. We also run flushRouterConfig on the other mongos processes. The other mongos processes continue to have ReplicaSetMonitorWatcher's that check for the removed shard. A restart of the mongos seems to be the only way to get it to recognise that the shard has been removed. I have tested the above behaviour against 2.4.10, 2.6.3 and 2.7.5 (Git version c184143fa4d8a4fdf4fdc684404d4aad3e55794b) |
| Comments |
| Comment by Benjamin Caimano (Inactive) [ 25/Jul/19 ] | |||||||||||||||||
|
The ReplicaSetMonitorWatcher no longer exists. | |||||||||||||||||
| Comment by David Murphy [ 18/Aug/14 ] | |||||||||||||||||
|
I would think that we should think about this in a more global way. If the topology of the clusters nodes is changed ( that is it had removeShard or addShard executed) it should update the config version in such a way that every mongos is is forced to reload the config. This would prevent only some mongos' from knowing about that change right? | |||||||||||||||||
| Comment by Greg Studer [ 13/Aug/14 ] | |||||||||||||||||
|
Well, rejecting connections via a firewall approach wouldn't require any processes to be stopped, but it isn't very elegant. | |||||||||||||||||
| Comment by Victor Hooi [ 13/Aug/14 ] | |||||||||||||||||
|
greg_10gen Thanks for that, I can confirm that if you take all of shard02 down (i.e. stop the entire replica set), it does stop monitoring it after a while. I stopped the replica set at 13:50 (GMT +10). Below are the mongos logs afterwards:
So this is another way to do it. However, I suspect this isn't ideal either, and is probably just as intrusive as needing to restart the mongos. Are you aware of any way to remove a shard and stop the monitoring, without needing to restart or terminate any processes? | |||||||||||||||||
| Comment by Greg Studer [ 12/Aug/14 ] | |||||||||||||||||
|
If the replica set is taken offline (or firewalled), after 5 minutes the mongos should stop monitoring it. Was this the case in your tests? Agree though that mongos could be smarter, and if the shard is being repurposed it may not be useful to shut it down for 5 minutes before reusing it. |