[SERVER-16582] Chunk Migration Failing Repeatedly on Initial Balancing Round Created: 18/Dec/14 Updated: 24/Jan/15 Resolved: 18/Dec/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 2.8.0-rc2 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | William Cross | Assignee: | Unassigned |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
|||||||||||||||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | |||||||||||||||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | |||||||||||||||||||||||||||||||||||||||||||||||||
| Steps To Reproduce: |
Expected result: Data is in several chunks, and the load is balanced with no errors. Actual result: Migration failures as the load balancer starts: "Failed with error 'chunk too big to move', from shard01 to shard03" and "Failed with error 'chunk too big to move', from shard01 to shard02", though the chunks seem to eventually get where they need to go. Here is the output of sh.status() partway through:
|
|||||||||||||||||||||||||||||||||||||||||||||||||
| Participants: |
| Description |
|
The balancer is getting stuck on its initial balancing load. |
| Comments |
| Comment by William Cross [ 18/Dec/14 ] |
|
scotthernandez, I think I misunderstood what was happening. Also, when I opened the ticket, I had originally thought that the balancer was abandoning the balancing process (though it was taking awhile). In any case, I am closing the ticket. |
| Comment by Scott Hernandez (Inactive) [ 18/Dec/14 ] |
|
I don't know what you mean, can you explain a bit more? When sharding existing collections all the chunks must be on a single shard and then be distributed over time. The balancer and shard commands (shardCollection, enableSharding, splitCollection, etc) are independent processes so as soon as the chunk metadata exists the balancer will start working from it. |
| Comment by William Cross [ 18/Dec/14 ] |
|
Files attached. Yes, they were jumbo chunks, but why was a migration attempted while the chunks were still getting their initial set of chunk splits? Shouldn't it wait a couple of minutes before trying & failing? I'm not seeing this behavior in 2.6, but maybe it's just that I noticed it in 2.8. |
| Comment by Scott Hernandez (Inactive) [ 18/Dec/14 ] |
|
Please provide logs and a dump of the config database. At first glance this looks like totally normal behavior if there are jumbo chunks. |