[SERVER-8022] Balancer doesn't work Created: 23/Dec/12 Updated: 15/Feb/13 Resolved: 27/Dec/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Admin, Storage |
| Affects Version/s: | 2.2.1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | larry.loi | Assignee: | Thomas Rueckstiess |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Production environment, MongoDB 2.2.1 64bit for linux |
||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Operating System: | ALL | ||||||||||||
| Participants: | |||||||||||||
| Description |
|
most of the data hold in one shard server. data doesn't distribute on all shard servers, I tried to stop and start the balancer, but didn't work. |
| Comments |
| Comment by Thomas Rueckstiess [ 28/Dec/12 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Larry, 1. Deleting data is a normal operation, but since you deleted only data prior to 2012-12-19 and your shardkey contains a date part, the deletion has left behind empty chunks. 2. A chunk is nothing more than a range (min - max) for each component of the shard key. The empty chunks on your system all have date ranges prior to 2012-12-19, and therefore won't be reused anymore, because all the new documents you insert now will have a later date. 3. The "coarsely ascending + search key" recommendation in Kristina Chodorow's book used months instead of days (which is much coarser) and didn't mention your scenario where old data is being deleted. Please also see the new support ticket SUPPORT-439 that I've opened for you for additional questions. We will continue to work on resolving this issue for you there. Regards, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by larry.loi [ 27/Dec/12 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
first of all, I've deleted some old data prior to 2012-12-19 from the collection counter_data, not deleted chunks. it should be normal operations. second of all, Does new data re-use empty chunks? third of all, our shard key combined collect_date (date) and game_session_id (UUID). this is a recommend configuration from "Scaling MongoDB"; with Coarsely ascending key + search key. I think it can fulfill ours and most of the situation. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Thomas Rueckstiess [ 26/Dec/12 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Deleting all data prior to 2012-12-19 would explain the uneven distribution of documents, as the chunks remain empty on their shards. We don't recommend manually deleting empty chunks as it can cause other problems. Unfortunately, with your date-based shard key, if you prune old data the chunks will remain empty forever and thus confuse the balancer. There is an open SERVER issue that describes your problem and can be considered a feature request to facilitate such an operations: Generally, we do not recommend using a monotonically increasing field (such as ObjectIDs, dates or timestamps) as a shard key, as new data will always only get inserted into a single shard. Now that we've determined that this is not a bug, I've opened the separate SUPPORT-439 ticket for you and would like us to continue all ongoing discussion regarding possible solutions privately there. Regards, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by larry.loi [ 26/Dec/12 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
some old data was removed. say before collect_date: 2012-12-19. do I need to remove old chunks manually? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by larry.loi [ 26/Dec/12 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Yes, chunks are distributed but data doesn't in my case. and there is no big jump in terms of traffic or customers after checked. Humm it is quite strange. here is the db.counter_data.stats(). all the rows are inside "shard0001".
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Thomas Rueckstiess [ 26/Dec/12 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
It appears that the chunks are evenly distributed:
However, chunks are only balanced by numbers, not by their actual size. Therefore it's possible that some of the chunks carry a lot more documents than others. Usually, when that happens, large chunks get split into smaller ones and distributed over the shards. In some cases, it is possible that a chunk becomes unsplittable, for example if all the documents contain the same shard key. In your case, this would be the same date and game_session_id number. Are you aware of any irregularities where a substantially higher number of documents with the same date and game_session_id have been inserted? I've also noticed that shard0001 contains the most recent chunks that have a date range from Dec 19 to 24. I'm wondering if the unequal distribution could just be a natural artifact of the fact that the days leading up christmas are busier than others? Or did you perhaps experience a large increase in traffic or customers over the last days? Could you please attach a db.counter_data.stats() from the fryatt_mcms_production database? This would show us the distribution of actual documents per shard for that particular collection. If we see the same discrepancy in the document numbers, we can then continue by finding out in which chunk(s) all these documents reside and why they aren't split and balanced. Regards, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by larry.loi [ 25/Dec/12 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
please see the attached sh_status.txt | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by larry.loi [ 25/Dec/12 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
please see the output of db.locks.find().pretty() mongos> db.locks.find().pretty() " | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Thomas Rueckstiess [ 24/Dec/12 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Larry, Can we have a look at the output of a sh.status(true) from the mongos? If the list is very long, you can also pipe it into a file by running this from your command line prompt
and attach the file to the ticket. Additionally, I'd like to look at the locks collection. Can you attach the output of the following commands (also run in the mongos shell) as well?
Thanks, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by larry.loi [ 23/Dec/12 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
most of data holding on shard "prd-mcms-vdb01.prod.laxigames.com:27021" mongos> db.stats() , , , }, |