[SERVER-1767] Chunk balancing does not resume after a temporary network outage of a config server Created: 09/Sep/10  Updated: 29/May/12  Resolved: 11/May/11

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 1.7.0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Tony Hannan Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

db version v1.7.1-pre-, pdfile version 4.5
Thu Sep 9 15:50:22 git version: 10f750fa617e7499a89e9a140d89ddb4a7427ad6


Operating System: ALL
Participants:

 Description   

1. Set up one replica set (2 servers)
2. Add enough data to it in one collection so it will be split into at least 8 chunks when sharded later.
3. Set up 3 sharding config servers
4. Start one mongos pointing to above config servers
5. Add above replica set as sole shard
6. Set up another replica set (2 servers)
7. Add second replica set as second shard
8. Enable sharding on above db and collection.
9. Wait until the balancer moves the first chunk to the second shard, then cut one config server from the network (using iptables, drop packets to/from it).
10. Wait at least one minute.
11. Reconnect config server to network (using iptables, allow packets to/from it).

Problem: No more chunks will move to the second shard, even if you add more data.



 Comments   
Comment by Greg Studer [ 11/May/11 ]

Pretty sure this is a duplicate of SERVER-3024 - unlocking should retry now.

Comment by Eliot Horowitz (Inactive) [ 18/Nov/10 ]

we don't have a good reproducible case for it, so unclear yet

Comment by Benedikt Waldvogel [ 18/Nov/10 ]

does this also happen on 1.6.3? Do you have a workaround? (restarting all config servers?)

Generated at Thu Feb 08 02:57:58 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.