[SERVER-5910] slow sharding already existed collections Created: 23/May/12 Updated: 08/Mar/13 Resolved: 23/Oct/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 2.1.1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Azat Khuzhin | Assignee: | Spencer Brody (Inactive) |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | performance | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Operating System: | ALL |
| Participants: |
| Description |
|
Results of migration one chunk (default size: 64mb), collection with 180 million of rows
where
Using next configuration
Even in the best case it is slow (245 secs) "removeshard" is also not so fast Attach logs (iostat, mongostat, mongotop, vmstat, at some tests have mongo {db,s}logs) Related links: |
| Comments |
| Comment by Spencer Brody (Inactive) [ 23/Oct/12 ] | ||||||
|
I'm closing out this ticket. If you are able to reproduce this on 2.2 with MMS and the log and other monitoring output that I mentioned earlier, feel free to re-open. | ||||||
| Comment by Azat Khuzhin [ 04/Sep/12 ] | ||||||
|
Hi Spencer, I know this, but I stop all application instances, and manually move data, after this start application. I'v already don't have such configuration. If you can start investigation without my results, please start it. | ||||||
| Comment by Spencer Brody (Inactive) [ 04/Sep/12 ] | ||||||
|
Hi Azat, Before we go any further here, I noticed that you are running the 2.1.1 unstable developer release. Can you try running these tests again, but using the newly-released stable 2.2 release? There have been a lot of changes in 2.2 since 2.1.1, and I want to make sure we aren't chasing down a bug that's actually already been fixed. Having MMS set up for the next attempt (along with capturing mongostat, iostat as you did for the last run) and sending the mongodb logs from the test run would all be helpful in helping us understand what's happening here. | ||||||
| Comment by Azat Khuzhin [ 28/Aug/12 ] | ||||||
|
I understand what happened. But it is too slow. But some time ago, because of sharding is slow, and I don't have enough time, I run "removeShard" command, and it was slower than migration individual chunk.
And it works faster than "removeShard" 100 times, and even more order (1000 maybe), I don't remember now. | ||||||
| Comment by Spencer Brody (Inactive) [ 28/Aug/12 ] | ||||||
|
If you shard a collection when it has no data in it, then insert a bunch of data, the system will be splitting and balancing as the data is being inserted, so when all the data is inserted, the amount remaining to be balanced should be small. If you insert a bunch of data first, then shard the collection, the data will all be living on one shard initially, and will have to be balanced over a longer period of time to come into balance. To clarify, are you saying that the system runs slow, the time to completely balance is slow, or the individual migrations are slow? | ||||||
| Comment by Azat Khuzhin [ 28/Aug/12 ] | ||||||
|
My English is not so good. When I sharding already existed collection - it is very slow, | ||||||
| Comment by Spencer Brody (Inactive) [ 27/Aug/12 ] | ||||||
|
I'm sorry, perhaps I misunderstand what you're doing. Are you saying that you're seeing these long migration times even on new collections with no data in them? Can you clarify what you mean by the speed isn't comparable when sharding an existing collection vs when you shard the collection by the beginning? | ||||||
| Comment by Azat Khuzhin [ 27/Aug/12 ] | ||||||
|
Hi Spencer, I understand that it need IO seeks and reading documents from disk. | ||||||
| Comment by Spencer Brody (Inactive) [ 27/Aug/12 ] | ||||||
|
Hi Azat, | ||||||
| Comment by Azat Khuzhin [ 24/May/12 ] | ||||||
|
It was virtual machine, amazon ec2 m1.medium | ||||||
| Comment by Eliot Horowitz (Inactive) [ 24/May/12 ] | ||||||
|
What was the underlying hardware when running the test? | ||||||
| Comment by Azat Khuzhin [ 24/May/12 ] | ||||||
|
I'v already shutdown this two instances. | ||||||
| Comment by Eliot Horowitz (Inactive) [ 24/May/12 ] | ||||||
|
Can you install MMS for this cluster with munin so we can see where the bottleneck is? |