[SERVER-10662] Sharding stopped working on a collection Created: 02/Sep/13 Updated: 04/Apr/23 Resolved: 16/Dec/13 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 2.2.3 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Anthony Pastor | Assignee: | Unassigned |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | crash, mongos, sharding | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Linux debian 6.0.5 |
||
| Operating System: | Linux |
| Participants: |
| Description |
|
Hello, We experienced a problem on our Mongo Sharded Cluster. One of the 3 ConfigServs failed recently. Thu Aug 29 10:50:00 [CheckConfigServers] ERROR: config servers 172.16.16.1:27019 and 172.16.18.1:27019 differconfig servers 172.16.16.1:27019 and 172.16.18.1:27019 differconfig servers 172.16.16.1:27019 and 172.16.18.1:27019 differconfig servers 172.16.16.1:27019 and 172.16.18.1:27019 differconfig servers not in sync! config servers 172.16.16.1:27019 and 172.16.18.1:27019 differ To recover from this state we did : 1) Disabled the Balancer with : 2) Stopped the mongodb-conf daemon from the failed-server (server1) and from a second server (We let the third server and it ConfigServ running). 3) Rsynced configdb datas from the server2 to the server1 4) Re-Started mongodb-conf daemon from server2 : ok 5) Re-Started mongodb-conf daemon from server1 : ok 6) Enabled again the Balancer with : sh.setBalancerState(true) Everything seemed ok, but now we could see in logs this issue : [Balancer] caught exception while doing balance: not sharded:rawlogs.raw_log Collections seems to be present but sharding is not ok : mongos> db.collections.find() , "unique" : false, "lastmodEpoch" : ObjectId("515a78325c52d82fad24aa03") } , "unique" : false, "lastmodEpoch" : ObjectId("51fbaf295c52d82fad24eccb") } mongos> db.raw_log.stats() , But apparently the config of the sharding is set (like it was) mongos> sh.status() shards: { "_id" : "shard1", "host" : "shard1/172.16.19.1:27018,172.16.19.2:27018" } { "_id" : "shard2", "host" : "shard2/172.16.19.3:27018,172.16.19.4:27018" } { "_id" : "shard3", "host" : "shard3/172.16.19.5:27018,172.16.19.6:27018" } { "_id" : "shard4", "host" : "shard4/172.16.19.7:27018,172.16.19.8:27018" } { "_id" : "shard5", "host" : "shard5/172.16.19.10:27018,172.16.19.9:27018" }databases: { "_id" : "admin", "partitioned" : false, "primary" : "config" } { "_id" : "rawlogs", "partitioned" : true, "primary" : "shard1" } rawlogs.raw_log chunks: mongos> We also tryied to retry the procedure (recovery of Configserv1 from configServ2) with mongoS stopped. Fri Aug 30 10:11:36 [Balancer] warning: got invalid chunk version 1|0||521f0c0563b2cfc94d8fad9b in document { _id: "rawlogs.raw_log-_id_MinKey", lastmod: Timestamp 1000|0, lastmodEpoch: ObjectId('521f0c0563b2cfc94d8fad9b'), ns: "rawlogs.raw_log", min: { _id: MinKey }, max: { _id: BinData }, shard: "shard1" } when trying to load differing chunks at version 0|0||515a78325c52d82fad24aa03 We don't want the sharding to initiate 'from scratch' We've already tryied to refresh the MongoS with : ) Unfortunately we haven't preserved the contents of the crashed config server that we replaced. Any idea please to resume sharding, please ? |
| Comments |
| Comment by Stennie Steneker (Inactive) [ 16/Dec/13 ] |
|
Hi Anthony, Apologies for the delayed response on this issue. The SERVER project is only intended for reporting bugs or feature suggestions for the MongoDB server. For MongoDB-related support discussion please post on the mongodb-user group (http://groups.google.com/group/mongodb-user) or Stack Overflow / ServerFault. Given you reported this issue some time ago, I'm going to assume that you have since found a solution. Regards, |