Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-3951

setShardVersion failed host errors

    • Type: Icon: Question Question
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 1.8.2
    • Component/s: Sharding
    • Labels:
      None
    • Environment:
      CentOS release 5.6 (Final) x86_64 x86_64 x86_64 GNU/Linux

      Hello,
      I got the following errors, when the application try to insert the data in DB. These error started appearing on initial load but after few minutes everything backs to normal.

      I found that there were three 500 application errors ( one on each server except its-d01) I looked at the tomcat logs and saw the same three errors which were complaining about mongos. The time of these error entries matches exactly with the error These issues that I noticed got fixed automatically without any changes/intervention from my side. However, what I am not able to comprehend is that if the errors show up in apache logs. tomcat logs and mongos logs, how did the data pertaining to these usernames(a17828, a052l508) still end up in the DB - in spite of these critical errors. Not sure why it happened and how to prevent it from happening in the future.

      Here is the current configuration:-

      Shards 2
      Configuration Servers :- 3
      Mongos :- 4

      > db.printShardingStatus()
      — Sharding Status —
      sharding version:

      { "_id" : 1, "version" : 3 }

      shards:
      {
      "_id" : "shard01",
      "host" : "shard01/its-d01.db.aol.com:4308,its-m01.db.aol.com:4308"
      }
      {
      "_id" : "shard02",
      "host" : "shard02/its-d02.db.aol.com:4308,its-m02.db.aol.com:4308"
      }
      databases:

      { "_id" : "admin", "partitioned" : false, "primary" : "config" } { "_id" : "test", "partitioned" : false, "primary" : "shard01" } { "_id" : "historyDb", "partitioned" : true, "primary" : "shard01" }

      historyDb.History chunks:
      shard02 29
      shard01 29
      too many chunksn to print, use verbose if you want to force print

      { "_id" : "reputationDb", "partitioned" : true, "primary" : "shard01" }

      reputationDb.Karma chunks:
      shard02 40
      shard01 41
      too many chunksn to print, use verbose if you want to force print

      shard01:PRIMARY> rs.status()
      {
      "set" : "shard01",
      "date" : ISODate("2011-09-22T17:58:13Z"),
      "myState" : 1,
      "members" : [
      {
      "_id" : 0,
      "name" : "its-d01.db.aol.com:4308",
      "health" : 1,
      "state" : 1,
      "stateStr" : "PRIMARY",
      "optime" :

      { "t" : 1316714290000, "i" : 2 }

      ,
      "optimeDate" : ISODate("2011-09-22T17:58:10Z"),
      "self" : true
      },
      {
      "_id" : 1,
      "name" : "its-m01.db.aol.com:4308",
      "health" : 1,
      "state" : 2,
      "stateStr" : "SECONDARY",
      "uptime" : 311026,
      "optime" :

      { "t" : 1316714290000, "i" : 2 }

      ,
      "optimeDate" : ISODate("2011-09-22T17:58:10Z"),
      "lastHeartbeat" : ISODate("2011-09-22T17:58:12Z")
      },
      {
      "_id" : 2,
      "name" : "its-m02.db.aol.com:4309",
      "health" : 1,
      "state" : 7,
      "stateStr" : "ARBITER",
      "uptime" : 311026,
      "optime" :

      { "t" : 0, "i" : 0 }

      ,
      "optimeDate" : ISODate("1970-01-01T00:00:00Z"),
      "lastHeartbeat" : ISODate("2011-09-22T17:58:12Z")
      }

      > db.stats()
      {
      "raw" : {
      "shard01/its-d01.db.aol.com:4308,its-m01.db.aol.com:4308" :

      { "db" : "test", "collections" : 3, "objects" : 5, "avgObjSize" : 40, "dataSize" : 200, "storageSize" : 7168, "numExtents" : 3, "indexes" : 1, "indexSize" : 8192, "fileSize" : 201326592, "ok" : 1 }

      },
      "objects" : 5,
      "avgObjSize" : 40,
      "dataSize" : 200,
      "storageSize" : 7168,
      "numExtents" : 3,
      "indexes" : 1,
      "indexSize" : 8192,
      "fileSize" : 201326592,
      "ok" : 1
      }

      Here are the logs to support my research.

      ITS-M01

      -----------

      Apache logs:-

      reputation.aol.com 172.19.150.23 - - [20/Sep/2011:15:50:39 --0400] "GET /its/api/submit_claim?username=a052l508&user_id=a052l508&context=login&source=imagination&reason=evil&confidence=1&value=fraud&ip=172.19.150.23&locale=en-US HTTP/1.1" 500 2494 "" "" Tnjuj6wVzHQAAGK2M-gAAAIt "-" /20110920/20110920-1550/20110920-155039-Tnjuj6wVzHQAAGK2M-gAAAIt 0 997 md5:0474fc81293193ac2f05ed83a2200993

      Tomcat Logs:-

      ----------------------

      2011-09-20 15:50:39,046 [TP-Processor37] DEBUG com.aol.trust.model.dao.KarmaDAO$1 - Executing BasicDBObjectCommand { "username" : { "$in" : [ "a052l508"]}} for reputationDb.Karma cursor

      2011-09-20 15:50:39,048 [TP-Processor37] DEBUG com.aol.trust.model.dao.KarmaDAO$1 - MongoDB query plan com.aol.trust.model.Karma$DirtyableDBObject

      $1@1cad1378

      cursor = "null"

      nscanned = "0"

      n = "0"

      millis = "null"

      2011-09-20 15:50:39,050 [TP-Processor37] DEBUG com.aol.trust.model.dao.KarmaDAO$1 - Saving Karma identified by {_id=4e78ee8f6db5deac51214377, user

      name=a052l508} with document {createTime=Tue Sep 20 15:50:39 EDT 2011, __olv=1, username=a052l508, _id=4e78ee8f6db5deac51214377, lastUpdate=Tue Sep

      20 15:50:39 EDT 2011, userId=a052l508, state=fraud, contexts=

      {login=com.aol.trust.model.Context@26d939fd}

      , lastContext=login}

      2011-09-20 15:50:39,052 [TP-Processor37] INFO com.aol.trust.timer.db.write.karma - ITS_DB_WRITE_KARMA_TIME - [a052l508] 2 millisecs

      2011-09-20 15:50:39,052 [TP-Processor37] DEBUG com.aol.trust.model.dao.HistoryDAO$2 - Saving History identified by {_id=4e78ee8f6db5deac51214378,

      username=a052l508} with document {__olv=1, username=a052l508, time=Tue Sep 20 15:50:39 EDT 2011, scope=login, _id=4e78ee8f6db5deac51214378, source=

      imagination, reason=evil, userId=a052l508, xff=null, value=fraud, confidence=1, ip=172.19.150.23}

      2011-09-20 15:50:39,117 [TP-Processor37] ERROR org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/its].[ApiServlet] - Servlet.service

      () for servlet ApiServlet threw exception

      com.mongodb.MongoException: setShardVersion failed host[shard01/its-d01.db.aol.com:4308,its-m01.db.aol.com:4308]

      { oldVersion: Timestamp 2000|5, as sertion: "assertion s/d_state.cpp:501", errmsg: "db assertion failure", ok: 0.0 }

      at com.mongodb.CommandResult.getException(CommandResult.java:82)

      at com.mongodb.CommandResult.throwOnError(CommandResult.java:116)

      at com.mongodb.DBTCPConnector._checkWriteError(DBTCPConnector.java:126)

      at com.mongodb.DBTCPConnector.say(DBTCPConnector.java:148)

      at com.mongodb.DBTCPConnector.say(DBTCPConnector.java:132)

      at com.mongodb.DBApiLayer$MyCollection.update(DBApiLayer.java:343)

      at org.iternine.jeppetto.dao.mongodb.MongoDBQueryModelDAO.trueSave(MongoDBQueryModelDAO.java:588)

      at org.iternine.jeppetto.dao.mongodb.MongoDBQueryModelDAO.save(MongoDBQueryModelDAO.java:257)

      at com.aol.trust.IdentityTrustServiceImplementation.saveToHistory(IdentityTrustServiceImplementation.java:310)

      at com.aol.trust.IdentityTrustServiceImplementation.submitClaim(IdentityTrustServiceImplementation.java:133)

      at com.aol.trust.SubmitClaimHandler.processRequest(SubmitClaimHandler.java:73)

      at com.aol.trust.ApiServlet.processApiCall(ApiServlet.java:131)

      at com.aol.trust.ApiServlet.doGet(ApiServlet.java:104)

      at javax.servlet.http.HttpServlet.service(HttpServlet.java:627)

      at javax.servlet.http.HttpServlet.service(HttpServlet.java:729)

      at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:269)

      at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)

      at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:204)

      at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)

      at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)

      at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)

      at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)

      Mongos

      ----------

      Tue Sep 20 15:50:39 [conn8] setShardVersion failed host[shard01/its-d01.db.aol.com:4308,its-m01.db.aol.com:4308]

      { oldVersion: Timestamp 2000|5, asserti on: "assertion s/d_state.cpp:501", errmsg: "db assertion failure", ok: 0.0 }

      Tue Sep 20 15:50:39 [conn8] Assertion: 10429:setShardVersion failed host[shard01/its-d01.db.aol.com:4308,its-m01.db.aol.com:4308]

      { oldVersion: Timestamp 200 0|5, assertion: "assertion s/d_state.cpp:501", errmsg: "db assertion failure", ok: 0.0 }

      0x5204fa 0x6a15ed 0x6a1152 0x6a1152 0x6a1152 0x578ce6 0x57715a 0x6311ed 0x6395bc 0x63d433 0x6682f2 0x67d187 0x580b7c 0x6a50a0 0x321420673d 0x32136d44bd

      /data/servers/mongodb/bin/mongos(_ZN5mongo11msgassertedEiPKc+0x12a) [0x5204fa]

      /data/servers/mongodb/bin/mongos [0x6a15ed]

      /data/servers/mongodb/bin/mongos [0x6a1152]

      /data/servers/mongodb/bin/mongos [0x6a1152]

      /data/servers/mongodb/bin/mongos [0x6a1152]

      /data/servers/mongodb/bin/mongos(_ZN5boost6detail8function17function_invoker4IPFbRN5mongo12DBClientBaseERKSsbiEbS5_S7_biE6invokeERNS1_15function_bufferES5_S

      7_bi+0x16) [0x578ce6]

      /data/servers/mongodb/bin/mongos(_ZN5mongo15ShardConnection11_finishInitEv+0xba) [0x57715a]

      /data/servers/mongodb/bin/mongos(_ZN5mongo8Strategy7doWriteEiRNS_7RequestERKNS_5ShardEb+0xdd) [0x6311ed]

      /data/servers/mongodb/bin/mongos(_ZN5mongo13ShardStrategy7_updateERNS_7RequestERNS_9DbMessageEN5boost10shared_ptrINS_12ChunkManagerEEE+0xd1c) [0x6395bc]

      /data/servers/mongodb/bin/mongos(_ZN5mongo13ShardStrategy7writeOpEiRNS_7RequestE+0x2b3) [0x63d433]

      /data/servers/mongodb/bin/mongos(_ZN5mongo7Request7processEi+0x172) [0x6682f2]

      /data/servers/mongodb/bin/mongos(_ZN5mongo21ShardedMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x77) [0x67d187]

      /data/servers/mongodb/bin/mongos(_ZN5mongo3pms9threadRunEPNS_13MessagingPortE+0x34c) [0x580b7c]

      /data/servers/mongodb/bin/mongos(thread_proxy+0x80) [0x6a50a0]

      /lib64/libpthread.so.0 [0x321420673d]

      /lib64/libc.so.6(clone+0x6d) [0x32136d44bd]

      Tue Sep 20 15:50:39 [conn8] ~ScopedDBConnection: _conn != null

      Tue Sep 20 15:50:39 [conn8] AssertionException in process: setShardVersion failed host[shard01/its-d01.db.aol.com:4308,its-m01.db.aol.com:4308]

      { oldVersion: Timestamp 2000|5, assertion: "assertion s/d_state.cpp:501", errmsg: "db assertion failure", ok: 0.0 }

      Mongod

      ------------

      > db.Karma.find(

      {"username" : "a052l508"}

      )

      { "id" : ObjectId("4e78ee8f6db5deac51214377"), "createTime" : ISODate("2011-09-20T19:50:39.046Z"), "_olv" : 1, "username" : "a052l508", "lastUpdate" : ISODate("2011-09-20T19:50:39.046Z"), "userId" : "a052l508", "state" : "fraud", "contexts" : { "login" :

      { "time" : ISODate("2011-09-20T19:50:39.046Z"), "scope" : "login", "source" : "imagination", "reason" : "evil", "xff" : null, "value" : "fraud", "confidence" : "1", "ip" : "172.19.150.23" }

      }, "lastContext" : "login" }

            Assignee:
            greg_10gen Greg Studer
            Reporter:
            ajay.batheja@teamaol.com Ajay Batheja
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: