[SERVER-9262] Mongo distributed locks aren't released => balancing stops working Created: 05/Apr/13 Updated: 11/Jul/16 Resolved: 13/Apr/13 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 2.2.3 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | James Blackburn | Assignee: | Unassigned |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Operating System: | ALL | ||||
| Participants: | |||||
| Description |
|
We have a shard that is nearly full, while other more recently added shards are nearly empty. I was wondering why a particularly large sharded collection is much larger on two shards than others:
Data is 10x the size on rs0/rs1 than 2,3,4 Looking at the config DB logs:
The last trace for mongoose.centaur is 36 hours ago. If I look in config.locks, I see:
There are many 'old' - some from February, March - in this collection in state 0. It's not clear to me what state 0 is, and whether this is important... On dlonapahls211's mongos log, for 2013-04-04T09:30:31.958 I see:
Any ideas on what the issue might be? How do I go about getting mongoose.centaur to be balanced again? Is there any way to see which collections Mongo thinks are eligible for balancing - i.e. how can I tell if it will eventually balance vs. it will never balance? |
| Comments |
| Comment by kishore battula [ 30/Jul/13 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
thank you Dan | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Pasette (Inactive) [ 29/Jul/13 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
kishore25kumar, to set the jumbo flag to false, you need to make a modification to the chunks collection in the config database directly from the mongos.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by kishore battula [ 26/Jul/13 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
can you please tell me how to set the jumbo flag to false | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Stennie Steneker (Inactive) [ 13/Apr/13 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi James, Thanks for confirming .. closing the issue. Cheers, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by James Blackburn [ 12/Apr/13 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Wonderful, thanks Eliot and Stephen. Chunks balancing nicely now. Cheers, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eliot Horowitz (Inactive) [ 12/Apr/13 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
If you look at the chunks collection, you'll probably see a "jumbo" flag. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by James Blackburn [ 12/Apr/13 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Magic! I hadn't realised that affected moving in addition to splitting. I've bumped the limit to 2048 (to allow moving chunks around a bit). The balancer still seems to be ignoring the chunks (I've turned it off and on again, and flushRouterConfig). The moveChunk command I ran manually above seemed to work. So I could go through and move a bunch of chunks manually. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eliot Horowitz (Inactive) [ 12/Apr/13 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
You can change the chunk size in config.settings. connect to mongos
then update the chunk size to 256mb or larger. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by James Blackburn [ 12/Apr/13 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Ah, got it. Got brave and tried to moveChunk manually. The chunks are too big. We added 2 shards first, then the remaining shards later:
166MB doesn't seem all that big - some of our data are 1GB (uncompressed) and we are following the recommendation of storing all the chunks on one shard (so a query hits one shard only by default). Is there any way to raise the moveChunk limit? Failing that I guess I can dump and re-insert the data. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eliot Horowitz (Inactive) [ 11/Apr/13 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Can you attach a mongodump of the entire config database to SUPPORT-534? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by James Blackburn [ 11/Apr/13 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Interestingly number of chunks has increased by a few, but still no balance for this collection:
Anything else I can look at to try to get to the bottom of why this collection doesn't balance? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by James Blackburn [ 08/Apr/13 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks both. Changelog uploaded to: SUPPORT-534 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Stennie Steneker (Inactive) [ 07/Apr/13 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi James, Can you create an issue in the Community Private project with the changelog zip attached: Please also reference this SERVER issue in the description. Correspondence and attachments in Community Private can only be viewed by yourself and 10gen support. Thanks, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by James Blackburn [ 07/Apr/13 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Sure. Is there any way to send / attach the file privately? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eliot Horowitz (Inactive) [ 07/Apr/13 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Can you send the contents of the changelog collection on the config servers? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by James Blackburn [ 07/Apr/13 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Looks like nothing has happened in mongoose.centaur since yesterday, the chunk distribution is the same... Is there anything else you wanted from sh.status()? Anything else I could look at? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by James Blackburn [ 06/Apr/13 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The balancer does appear to be working. As I drop and re-insert data (as detailed in SERVER-9243). I see that the data is ending up well partitioned.
print_db_stats:
I also notice, that as the balancer works, count() returns different answers (the right answer is 30076):
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eliot Horowitz (Inactive) [ 06/Apr/13 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Can you send sh.status() How many mongos do you have? Can you check that the clocks are synchronized and running ntp on all the machines? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by James Blackburn [ 06/Apr/13 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The balancer doesn't seem to be doing anything this morning, I notice errors like this in a mongos log:
|