Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Incomplete
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 2.2.3
Component/s: Sharding
Labels:
- crash
- mongos
- sharding
Environment:
Linux debian 6.0.5

Operating System:
Linux
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Hello,

We experienced a problem on our Mongo Sharded Cluster.
We Use Mongo 2.2.3 on Linux Debian 6.0.5

One of the 3 ConfigServs failed recently.
The server came back online few hours later.
On the mongoS logfile we could observe :

Thu Aug 29 10:50:00 [CheckConfigServers] ERROR: config servers 172.16.16.1:27019 and 172.16.18.1:27019 differconfig servers 172.16.16.1:27019 and 172.16.18.1:27019 differconfig servers 172.16.16.1:27019 and 172.16.18.1:27019 differconfig servers 172.16.16.1:27019 and 172.16.18.1:27019 differconfig servers not in sync! config servers 172.16.16.1:27019 and 172.16.18.1:27019 differ

To recover from this state we did :

1) Disabled the Balancer with :
sh.setBalancerState(false)

2) Stopped the mongodb-conf daemon from the failed-server (server1) and from a second server (We let the third server and it ConfigServ running).

3) Rsynced configdb datas from the server2 to the server1

4) Re-Started mongodb-conf daemon from server2 : ok

5) Re-Started mongodb-conf daemon from server1 : ok

6) Enabled again the Balancer with :

sh.setBalancerState(true)

Everything seemed ok, but now we could see in logs this issue :

[Balancer] caught exception while doing balance: not sharded:rawlogs.raw_log

Collections seems to be present but sharding is not ok :

mongos> db.collections.find()
{ "_id" : "rawlogs.raw_log", "lastmod" : ISODate("1970-01-16T19:08:22.332Z"), "dropped" : false, "key" :

{ "_id" : 1 }

, "unique" : false, "lastmodEpoch" : ObjectId("515a78325c52d82fad24aa03") }
{ "_id" : "rawlogs.raw_log_ghost", "lastmod" : ISODate("1970-01-16T22:04:08.874Z"), "dropped" : false, "key" :

{ "_id" : 1 }

, "unique" : false, "lastmodEpoch" : ObjectId("51fbaf295c52d82fad24eccb") }
mongos>

mongos> db.raw_log.stats()
{
"sharded" : false,
"primary" : "shard1",
"ns" : "rawlogs.raw_log",
"count" : 2380607210,
"size" : 1044708269072,
"avgObjSize" : 438.84109259334724,
"storageSize" : NumberLong("1116861681584"),
"numExtents" : 541,
"nindexes" : 1,
"lastExtentSize" : 2146426864,
"paddingFactor" : 1,
"systemFlags" : 1,
"userFlags" : 0,
"totalIndexSize" : 80006256176,
"indexSizes" :

{ "_id_" : 80006256176 }

,
"ok" : 1
}
mongos>

But apparently the config of the sharding is set (like it was)

mongos> sh.status()
— Sharding Status —
sharding version:

{ "_id" : 1, "version" : 3 }

shards:

{ "_id" : "shard1", "host" : "shard1/172.16.19.1:27018,172.16.19.2:27018" } { "_id" : "shard2", "host" : "shard2/172.16.19.3:27018,172.16.19.4:27018" } { "_id" : "shard3", "host" : "shard3/172.16.19.5:27018,172.16.19.6:27018" } { "_id" : "shard4", "host" : "shard4/172.16.19.7:27018,172.16.19.8:27018" } { "_id" : "shard5", "host" : "shard5/172.16.19.10:27018,172.16.19.9:27018" }

databases:

{ "_id" : "admin", "partitioned" : false, "primary" : "config" } { "_id" : "rawlogs", "partitioned" : true, "primary" : "shard1" }

rawlogs.raw_log chunks:
shard1 24615
shard2 8279
shard3 3314
shard4 3498
shard5 10263
too many chunks to print, use verbose if you want to force print
rawlogs.raw_log_ghost chunks:
shard1 368
shard3 277
shard2 277
shard4 414
shard5 1162
too many chunks to print, use verbose if you want to force print

{ "_id" : "tempstats", "partitioned" : false, "primary" : "shard1" } { "_id" : "test", "partitioned" : false, "primary" : "shard3" } { "_id" : "stats", "partitioned" : false, "primary" : "shard5" } { "_id" : "rawlog", "partitioned" : false, "primary" : "shard5" }

mongos>

We also tryied to retry the procedure (recovery of Configserv1 from configServ2) with mongoS stopped.
We haven't had much success but in the logs when we restarted the mongoS and re-enabled the balancer we could see in logs :

Fri Aug 30 10:11:36 [Balancer] warning: got invalid chunk version 1|0||521f0c0563b2cfc94d8fad9b in document { _id: "rawlogs.raw_log-_id_MinKey", lastmod: Timestamp 1000|0, lastmodEpoch: ObjectId('521f0c0563b2cfc94d8fad9b'), ns: "rawlogs.raw_log", min:

{ _id: MinKey }

, max:

{ _id: BinData }

, shard: "shard1" } when trying to load differing chunks at version 0|0||515a78325c52d82fad24aa03
Fri Aug 30 10:11:36 [Balancer] warning: major change in chunk information found when reloading rawlogs.raw_log, previous version was 0|0||515a78325c52d82fad24aa03
Fri Aug 30 10:11:36 [Balancer] ChunkManager: time to load chunks for rawlogs.raw_log: 48ms sequenceNumber: 2 version: 0|0||000000000000000000000000 based on: (empty)
Fri Aug 30 10:11:36 [Balancer] warning: no chunks found for collection rawlogs.raw_log, assuming unsharded
Fri Aug 30 10:11:36 [Balancer] ChunkManager: time to load chunks for rawlogs.raw_log_ghost: 31ms sequenceNumber: 3 version: 707|393||51fbaf295c52d82fad24eccb based on: (empty)
Fri Aug 30 10:11:36 [Balancer] distributed lock 'balancer/mycompt.local:27021:1377850265:1804289383' unlocked.
Fri Aug 30 10:11:36 [Balancer] scoped connection to 172.16.16.1:27019,172.16.18.1:27019,172.16.18.2:27019 not being returned to the pool
Fri Aug 30 10:11:36 [Balancer] caught exception while doing balance: not sharded:rawlogs.raw_log

We don't want the sharding to initiate 'from scratch'
We'd like to enable the continuity of the past state (before config server1 failed).

We've already tryied to refresh the MongoS with :
db.adminCommand(

{flushRouterConfig: 1}

)
Without better result.

Unfortunately we haven't preserved the contents of the crashed config server that we replaced.

Any idea please to resume sharding, please ?

Assignee:: Unassigned
Reporter:: Anthony Pastor
Participants:: Anthony Pastor, Stennie Steneker
Votes:: 0 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: Sep 02 2013 08:23:10 AM UTC
Updated:: Apr 04 2023 02:23:02 PM UTC
Resolved:: Dec 16 2013 04:22:14 AM UTC

Details

Description

Attachments

Forms

Activity

People

Dates