Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-14213

Config Server Corruption - BSONObj size: 1852404841 (0x6974696E) is invalid. Size must be between 0 and 16793600(16MB)

    XMLWordPrintableJSON

Details

    • Icon: Question Question
    • Resolution: Done
    • Icon: Major - P3 Major - P3
    • None
    • None
    • Sharding
    • None

    Description

      Hi MongoDB,

      I'm running a sharded cluster with:

      • 3 MongoS instances (version 2.4.5)
      • 9 MongoD (3 data nodes per shard, a primary, secondary and arbiter, all are version 2.4.5)
      • 3 Config Servers (version 2.4.5)

      When I attempt to shard a new collection on an existing DB I get the following:

      sh.shardCollection("Customer.CustomerEventVisits",{a:1,b:1},true)
      {
              "code" : 8017,
              "ok" : 0,
              "errmsg" : "exception: update not consistent  ns: config.chunks query: { _id: \"Customer.CustomerEventVisits-a_MinKeyb_MinKey\" } update: { _id: \"Customer.CustomerEventVisits-a_MinKeyb_MinKey\", lastmod: Timestamp 1000|0, lastmodEpoch: ObjectId('5395c5793d5ce34d5ccd6823'), ns: \"Customer.CustomerEventVisits\", min: { a: MinKey, b: MinKey }, max: { a: MaxKey, b: MaxKey }, shard: \"j1shard\" } gle1: { updatedExisting: false, n: 1, lastOp: Timestamp 1402324345000|3, connectionId: 2032481, waited: 27, err: null, ok: 1.0 } gle2: { err: \"BSONObj size: 1852404841 (0x6974696E) is invalid. Size must be between 0 and 16793600(16MB) First element: : ?type=103\", code: 10334, n: 0, connectionId: 2031264, waited: 10, ok: 1.0 
      }

      I'm also seeing the following logged on each config server:

      Mon Jun  9 13:09:03.238 [LockPinger] warning: distributed lock pinger 'bomongodbc1n1:30003,bomongodbc1n2:30003,bomongodbc1n3:30003/bomongos02.csnzoo.com:30004:1396381269:1804289383' detected an exception while pinging. :: caused by :: update not consistent  ns: config.lockpings query: { _id: "bomongos02.csnzoo.com:30004:1396381269:1804289383" } update: { $set: { ping: new Date(1402333743122) } } gle1: { updatedExisting: true, n: 1, lastOp: Timestamp 1402333743000|2, connectionId: 2035785, waited: 36, err: null, ok: 1.0 } gle2: { err: "BSONObj size: 1852404841 (0x6974696E) is invalid. Size must be between 0 and 16793600(16MB) First element: : ?type=103", code: 10334, n: 0, connectionId: 2034560, waited: 4, ok: 1.0 }

      as well as entries like:

      Jun  9 13:24:42 bomongodbc1n3 mongod.30003[32118]: Mon Jun  9 13:24:42.223 [conn2034890] update config.mongos query: { _id: "bomongos01.csnzoo.com:30004" } update: { $set: { ping: new Date(1402334682197), up: 5953440, waiting: true, mongoVersion: "2.4.5" } } idhack:1 fastmod:1 keyUpdates:0 exception: BSONObj size: 1852404841 (0x6974696E) is invalid. Size must be between 0 and 16793600(16MB) First element: : ?type=103 code:10334 locks(micros) w:25423 12ms

      I believe my config server collections (lockpings and mongos) have bad data in them... in fact when I look at the documents in each there are old mongos entries that don't exist and there inconsistent lock times or entries that are valid when comparing across the 3 config servers

      Any idea on how to resolve this?

      It's a production instance so I'm hesitant to make a change and it doesn't sound like my config backups will help since this has been going on past the retention threshold I have...

      Thanks so much!
      Mike

      Attachments

        Activity

          People

            ramon.fernandez@mongodb.com Ramon Fernandez Marina
            amarettoslim Mike
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: