[SERVER-6790] '[Balancer] Assertion: 13141:Chunk map pointed to incorrect chunk' error in the mongos log Created: 17/Aug/12 Updated: 15/Feb/13 Resolved: 20/Aug/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 2.2.0-rc1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Vladimir Poluyaktov | Assignee: | Spencer Brody (Inactive) |
| Resolution: | Duplicate | Votes: | 1 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
All servers are EC2 hosts, Ubuntu 11.04 (GNU/Linux 2.6.38-13-virtual x86_64) |
||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Operating System: | Linux | ||||||||||||
| Participants: | |||||||||||||
| Description |
|
Recently we sharded few big collections in our database (2.2.0-rc1) First collection was balanced just fine: But for second collection we get a balancer error in mongos log every few seconds: Fri Aug 17 00:56:27 [Balancer] ns: oggiReporting.playlist.mainStats going to move { _id: "oggiReporting.playlist.mainStats-campaignId_MinKeyplaylistId_MinKeydate_MinKey", lastmod: Timestamp 1000|0, lastmodEpoch: ObjectId('502be6c21d848a61a13d51eb'), ns: "oggiReporting.playlist.mainStats", min: { campaignId: MinKey, playlistId: MinKey, date: MinKey }, max: { campaignId: "014869ee-5d62-4212-a609-40edcc2169d3", playlistId: "0c5bb01a-5780-413d-8d7c-b22c722e6ad4", date: new Date(1309309200000) }, shard: "RS01" } from: RS01 to: RS02 tag [] *c: ns:oggiReporting.domain.mainStats at: RS02:RS02/RPTDB-RS02-Zuse1b-S01.oggifinogi.com:27018,RPTDB-RS02-Zuse1c-S01.oggifinogi.com:27018,RPTDB-RS02-Zuse1d-S01.oggifinogi.com:27018 lastmod: 2|0||000000000000000000000000 min: { campaignId: MinKey, playlistId: MinKey, domain: MinKey, date: MinKey }max: { campaignId: "0095af50-203b-4e1a-9337-6dad60a46688", playlistId: "09038ddd-ac89-4818-b6c3-90c3c43cdce9", domain: "biography.com", date: new Date(1311120000000), partition: "2011-07" }key: { campaignId: "0095af50-203b-4e1a-9337-6dad60a46688", playlistId: "09038ddd-ac89-4818-b6c3-90c3c43cdce9", domain: "biography.com", date: new Date(1311120000000) }Fri Aug 17 00:56:27 [Balancer] Assertion: 13141:Chunk map pointed to incorrect chunk I tried to reboot mongos as well as config and replica set servers - no luck |
| Comments |
| Comment by Greg Studer [ 20/Aug/12 ] |
|
Thanks for the info, and thanks for trying rc-1 - the problem you're running into is |
| Comment by Vladimir Poluyaktov [ 20/Aug/12 ] |
|
Hi Greg! Well I did the following: 1. I dropped the oggiReporting.domain.mainStats collection. , {background : 1})) Log files from all servers attached to this issue (see logs.zip file): RPTSRVC-Zuse1b-S01.log - monogs server RPTDB-CFG-Zuse1b-S01.log - config server RPTDB-RS01-Zuse1b-S01.log - primary server in Replica Set RS01 RPTDB-RS02-Zuse1b-S01.log - primary server in Replica Set RS02 RPTDB-RS03-Zuse1b-S01.log - primary server in Replica Set RS03 Also I attached fresh dump of our config db made right after I received the error (configdb.dmp.tgz) That is interesting - we have already sharded four of our big collections. All of them were balanced just fine: mongos> db.printShardingStatus() shards: { "_id" : "RS01", "host" : "RS01/RPTDB-RS01-Zuse1b-S01.oggifinogi.com:27018,RPTDB-RS01-Zuse1c-S01.oggifinogi.com:27018,RPTDB-RS01-Zuse1d-S01.oggifinogi.com:27018" } { "_id" : "RS02", "host" : "RS02/RPTDB-RS02-Zuse1b-S01.oggifinogi.com:27018,RPTDB-RS02-Zuse1c-S01.oggifinogi.com:27018,RPTDB-RS02-Zuse1d-S01.oggifinogi.com:27018" } { "_id" : "RS03", "host" : "RS03/RPTDB-RS03-Zuse1b-S01.oggifinogi.com:27018,RPTDB-RS03-Zuse1c-S01.oggifinogi.com:27018,RPTDB-RS03-Zuse1d-S01.oggifinogi.com:27018" }databases: { "_id" : "admin", "partitioned" : false, "primary" : "config" } { "_id" : "social", "partitioned" : false, "primary" : "RS01" } { "_id" : "test", "partitioned" : false, "primary" : "RS01" } { "_id" : "oggiReportingTest", "partitioned" : false, "primary" : "RS01" } { "_id" : "oggiReporting", "partitioned" : true, "primary" : "RS01" } oggiReporting.cmSite chunks: Only oggiReporting.domain.mainStats failed with the error. So it can be just data issue I think. |
| Comment by Greg Studer [ 20/Aug/12 ] |
|
Thanks for the database dump - it looks like somehow there is confusion between the different mainStats collections - is it possible to also send the full log files of the admin and balancing mongoses you're using? Trying to track down why a chunk was selected from "oggiReporting.playlist.mainStats" by the balancer, which is apparently dropped, but "oggiReporting.domain.mainStats" is checked. Also, if you can reproduce this every time you restart mongos, the logs after a fresh restart for a few minutes while the error appears several times would be very helpful. To do this you just need to start using "-vv". If that is not an option the full log files or any mongos logs you have available will be useful. Just to verify - did you restart the replica set servers (or just stepdown the primaries) after dropping the old collection which was erroring and before creating the new collection? |
| Comment by Vladimir Poluyaktov [ 18/Aug/12 ] |
|
Config database dump (tar.gzip archive) |
| Comment by Vladimir Poluyaktov [ 18/Aug/12 ] |
|
Hi Spencer! Config db dump is attached. I tried to dump/drop/restore/shard the collection three times - each time I got the same error in few minutes after I sharded the collection. |
| Comment by Vladimir Poluyaktov [ 18/Aug/12 ] |
|
Config database dump (tar.gzip archive) |
| Comment by Pavlo Grinchenko [ 17/Aug/12 ] |
|
We are migrating our non-sharded 2.0.6 to sharded 2.2.0-rc1 |
| Comment by Pavlo Grinchenko [ 17/Aug/12 ] |
|
Me and Vladimir are describing the same situation. We are both working in the same company. We will prepare configuration servers dump for you. |
| Comment by Spencer Brody (Inactive) [ 17/Aug/12 ] |
|
Do you see the same problem when running on 2.0.7, or only in 2.2.0-rc1? Pavlo, are you also running 2.2.0-rc1? Could you attach a dump of your config database from running mongodump against a config server? I'd like to see if your chunk mappings somehow got messed up. If you'd rather not attach that to this publically-viewable ticket, you can create a ticket in the "Community Private" project, attach the dump there, then post a link to the Community Private ticket here. Tickets in the Community Private project will only be viewable to the reporter and to employees of 10gen. |
| Comment by Pavlo Grinchenko [ 17/Aug/12 ] |
|
We had the following situation: 2nd collection started to fail with the exception specified above. We tried to do the following work-around: We saw that during restore process it already puts data into shards - which is good. |