[SERVER-23975] migrating chunks from one shard to another which is also set to be removed Created: 28/Apr/16 Updated: 04/May/16 Resolved: 29/Apr/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 3.0.6 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Rob Reid | Assignee: | Unassigned |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
We are migrating data from a legacy set of shards to a new set. Looking at the logs I see that it is moving chunks from one shard to another which is also in the set of shards to be removed. Of course this means that the contents of that chunk are going to be migrated more than 1 time, making the already painfully slow balancing process even worse. Here's an excerpt from the log showing that it is moving a chunk from Shard_18 to Shard_83:
|
| Comments |
| Comment by Kevin Pulo [ 04/May/16 ] | |||||||||||||||||
|
As explained on | |||||||||||||||||
| Comment by Rob Reid [ 03/May/16 ] | |||||||||||||||||
|
FYI, I've now come to understand that the source of my confusion was the intermingling of log data from 2 different sources. The output indicating that a set of shards is set to be removed was from a script by our OPS team. | |||||||||||||||||
| Comment by Ramon Fernandez Marina [ 03/May/16 ] | |||||||||||||||||
|
Please see also | |||||||||||||||||
| Comment by Ramon Fernandez Marina [ 29/Apr/16 ] | |||||||||||||||||
|
I've opened | |||||||||||||||||
| Comment by Rob Reid [ 29/Apr/16 ] | |||||||||||||||||
|
I understand that only one shard can be in draining state at once. I was surprised that multiple shards could be marked for removal. I haven't seen that documented. | |||||||||||||||||
| Comment by Ramon Fernandez Marina [ 29/Apr/16 ] | |||||||||||||||||
|
robreid, in a sharded cluster only one shard can be in draining state. Here's an example on a local setup:
Other than the corner case described in Please take a look at the documentation to migrate a sharded cluster to new hardware. Assuming your shards are replica set, the preferred approach is to replace replica set members, not complete shards; this will accomplish the upgrade via initial syncs as opposed to chunk migrations. Regards, | |||||||||||||||||
| Comment by Rob Reid [ 28/Apr/16 ] | |||||||||||||||||
|
These are from a PaperTrail presentation which includes log entries for many hosts. I don't believe I can access the logs directly. My understanding is that the OPS team executed a loop to make removeShard calls for each of the specific shards. It was supposed to only submit one shard at a time, and block on submitting further shards until that one shard was removed. The config.shards collection shows only one shard in "draining" state. But the set in the log, "Shard(s) to remove", suggests that they were all submitted. | |||||||||||||||||
| Comment by Ramon Fernandez Marina [ 28/Apr/16 ] | |||||||||||||||||
|
robreid, can you please elaborate on how are you removing the old shards and how are the non-mongod log lines above being generated? Are you running the balancer or manually moving chunks? It is possible you're running into Thanks, |