Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-7922

All operations blocked on one sharded collection

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 2.2.2
    • Component/s: Performance, Sharding, Stability
    • Labels:
      None
    • Environment:
      Linux Centos5/6
    • Linux

      Every morning since last week, all operations to a sharded collection are failing.

      Here are application side errors :
      setShardVersion failed host: mdbcis4-01-sv.criteo.prod:27021

      { oldVersion: Timestamp 0|0, oldVersionEpoch: ObjectId('000000000000000000000000'), ns: "counters.statistics", version: Timestamp 4000|3, versionEpoch: ObjectId('000000000000000000000000'), globalVersion: Timestamp 6000|0, globalVersionEpoch: ObjectId('000000000000000000000000'), reloadConfig: true, errmsg: "shard global version for collection is higher than trying to set to 'counters.statistics'", ok: 0.0 }

      '. (Response was { "err" : "setShardVersion failed host: mdbcis4-01-sv.criteo.prod:27021

      { oldVersion: Timestamp 0|0, oldVersionEpoch: ObjectId('000000000000000000000000'), ns: \"counters.statistics\", version: Timestamp 4000|3, versionEpoch: ObjectId('000000000000000000000000'), globalVersion: Timestamp 6000|0, globalVersionEpoch: ObjectId('000000000000000000000000'), reloadConfig: true, errmsg: \"shard global version for collection is higher than trying to set to 'counters.statistics'\", ok: 0.0 }

      ", "code" : 10429, "n" : 0, "ok" : 1.0 }). : MongoDB.Driver.SafeModeResult SendMessage(MongoDB.Driver.Internal.MongoRequestMessage, MongoDB.Driver.SafeMode)

      Server-side ones :
      warning: aborted moveChunk because official version less than mine?: official 5|1||000000000000000000000000 mine: 6|0||000000000000000000000000

      Restarting mongod unlocks operations until next morning.

      In attachment, logs of servers involved in the moveChunk process (shard4 to shard6), sh_status output, changelog collection output
      In logs issue starts at Wed Dec 12 06:47:09, ends at Wed Dec 12 09:30:00 after restart.

      Link to our MMS dashboard : https://mms.10gen.com/host/list/4f8d732587d1d86fa8b99c12
      Problem occured before we added 8th shard and seems to be linked to previous bugs : https://jira.mongodb.org/browse/SERVER-7034 and https://jira.mongodb.org/browse/SERVER-7821

        1. changelog
          10.81 MB
        2. config_dump.tar
          36.23 MB
        3. mongod-shard4_20121211.zip
          1.48 MB
        4. mongod-shard4-shard6-logs.zip
          1.38 MB
        5. sh_status
          6 kB

            Assignee:
            david.hows David Hows
            Reporter:
            k.hodin@criteo.com Klébert Hodin
            Votes:
            1 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: