Details
-
Question
-
Resolution: Done
-
Major - P3
-
None
-
None
-
None
Description
Hi MongoDB,
I'm running a sharded cluster with:
- 3 MongoS instances (version 2.4.5)
- 9 MongoD (3 data nodes per shard, a primary, secondary and arbiter, all are version 2.4.5)
- 3 Config Servers (version 2.4.5)
When I attempt to shard a new collection on an existing DB I get the following:
sh.shardCollection("Customer.CustomerEventVisits",{a:1,b:1},true)
|
{
|
"code" : 8017,
|
"ok" : 0,
|
"errmsg" : "exception: update not consistent ns: config.chunks query: { _id: \"Customer.CustomerEventVisits-a_MinKeyb_MinKey\" } update: { _id: \"Customer.CustomerEventVisits-a_MinKeyb_MinKey\", lastmod: Timestamp 1000|0, lastmodEpoch: ObjectId('5395c5793d5ce34d5ccd6823'), ns: \"Customer.CustomerEventVisits\", min: { a: MinKey, b: MinKey }, max: { a: MaxKey, b: MaxKey }, shard: \"j1shard\" } gle1: { updatedExisting: false, n: 1, lastOp: Timestamp 1402324345000|3, connectionId: 2032481, waited: 27, err: null, ok: 1.0 } gle2: { err: \"BSONObj size: 1852404841 (0x6974696E) is invalid. Size must be between 0 and 16793600(16MB) First element: : ?type=103\", code: 10334, n: 0, connectionId: 2031264, waited: 10, ok: 1.0
|
}
|
I'm also seeing the following logged on each config server:
Mon Jun 9 13:09:03.238 [LockPinger] warning: distributed lock pinger 'bomongodbc1n1:30003,bomongodbc1n2:30003,bomongodbc1n3:30003/bomongos02.csnzoo.com:30004:1396381269:1804289383' detected an exception while pinging. :: caused by :: update not consistent ns: config.lockpings query: { _id: "bomongos02.csnzoo.com:30004:1396381269:1804289383" } update: { $set: { ping: new Date(1402333743122) } } gle1: { updatedExisting: true, n: 1, lastOp: Timestamp 1402333743000|2, connectionId: 2035785, waited: 36, err: null, ok: 1.0 } gle2: { err: "BSONObj size: 1852404841 (0x6974696E) is invalid. Size must be between 0 and 16793600(16MB) First element: : ?type=103", code: 10334, n: 0, connectionId: 2034560, waited: 4, ok: 1.0 }
|
as well as entries like:
Jun 9 13:24:42 bomongodbc1n3 mongod.30003[32118]: Mon Jun 9 13:24:42.223 [conn2034890] update config.mongos query: { _id: "bomongos01.csnzoo.com:30004" } update: { $set: { ping: new Date(1402334682197), up: 5953440, waiting: true, mongoVersion: "2.4.5" } } idhack:1 fastmod:1 keyUpdates:0 exception: BSONObj size: 1852404841 (0x6974696E) is invalid. Size must be between 0 and 16793600(16MB) First element: : ?type=103 code:10334 locks(micros) w:25423 12ms
|
I believe my config server collections (lockpings and mongos) have bad data in them... in fact when I look at the documents in each there are old mongos entries that don't exist and there inconsistent lock times or entries that are valid when comparing across the 3 config servers
Any idea on how to resolve this?
It's a production instance so I'm hesitant to make a change and it doesn't sound like my config backups will help since this has been going on past the retention threshold I have...
Thanks so much!
Mike