Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-3120

mongos core dumps when mongodump is dumping out from a replicaset where 2 out of 3 are in RECOVERING state

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 1.8.2
    • Affects Version/s: 1.8.1
    • Component/s: Sharding
    • None
    • ALL

      Problem:
      During a mongodump the mongos process that it was connected to died. In the logs the following was seen

      hu May 19 02:01:07 [LockPinger] dist_lock pinged successfully for: us0101aej024.tangome.gbl:1305761193:1804289383
      Thu May 19 02:04:21 [mongosMain] connection accepted from 127.0.0.1:39876 #8
      Thu May 19 02:04:33 [conn8] got not master for: us0101amd206
      Thu May 19 02:04:39 [conn8] end connection 127.0.0.1:39876
      Received signal 6
      Backtrace: 0x52e235 0x301ac302d0 0x301ac30265 0x301ac31d10 0x301ac296e6 0x5517c3 0x552208 0x54a10c 0x53fcca 0x577eae 0x5789dc 0x69dae1 0x69ec38 0x301b40673d 0x301acd3f6d
      /local/mongo/bin/mongos(_ZN5mongo17printStackAndExitEi+0x75)[0x52e235]
      /lib64/libc.so.6[0x301ac302d0]
      /lib64/libc.so.6(gsignal+0x35)[0x301ac30265]
      /lib64/libc.so.6(abort+0x110)[0x301ac31d10]
      /lib64/libc.so.6(__assert_fail+0xf6)[0x301ac296e6]
      /local/mongo/bin/mongos(_ZN5mongo18DBClientReplicaSet11checkMasterEv+0x4b3)[0x5517c3]
      /local/mongo/bin/mongos(_ZN5mongo18DBClientReplicaSet7findOneERKSsRKNS_5QueryEPKNS_7BSONObjEi+0x128)[0x552208]
      /local/mongo/bin/mongos(_ZN5mongo20DBClientWithCommands10runCommandERKSsRKNS_7BSONObjERS3_i+0x8c)[0x54a10c]
      /local/mongo/bin/mongos(ZN5mongo20DBClientWithCommands13simpleCommandERKSsPNS_7BSONObjES2+0x12a)[0x53fcca]
      /local/mongo/bin/mongos(_ZN5mongo17ClientConnections7releaseERKSsPNS_12DBClientBaseE+0x10e)[0x577eae]
      /local/mongo/bin/mongos(_ZN5boost19thread_specific_ptrIN5mongo17ClientConnectionsEE11delete_dataclEPv+0xac)[0x5789dc]
      /local/mongo/bin/mongos(tls_destructor+0xb1)[0x69dae1]
      /local/mongo/bin/mongos(thread_proxy+0x88)[0x69ec38]
      /lib64/libpthread.so.0[0x301b40673d]
      /lib64/libc.so.6(clone+0x6d)[0x301acd3f6d]
      ===
      Received signal 11
      Backtrace: 0x52e235 0x301ac302d0 0x53fc8d 0x577eae 0x5789dc 0x69f3c1 0x573b22 0x301ac333a5 0x52e29b 0x301ac302d0 0x301ac30265 0x301ac31d10 0x301ac296e6 0x5517c3 0x552208 0x54a10c 0x53fcca 0x577eae 0x5789dc 0x69dae1
      /local/mongo/bin/mongos(_ZN5mongo17printStackAndExitEi+0x75)[0x52e235]
      /lib64/libc.so.6[0x301ac302d0]
      /local/mongo/bin/mongos(ZN5mongo20DBClientWithCommands13simpleCommandERKSsPNS_7BSONObjES2+0xed)[0x53fc8d]
      /local/mongo/bin/mongos(_ZN5mongo17ClientConnections7releaseERKSsPNS_12DBClientBaseE+0x10e)[0x577eae]
      /local/mongo/bin/mongos(_ZN5boost19thread_specific_ptrIN5mongo17ClientConnectionsEE11delete_dataclEPv+0xac)[0x5789dc]
      /local/mongo/bin/mongos(_ZN5boost6detail12set_tss_dataEPKvNS_10shared_ptrINS0_20tss_cleanup_functionEEEPvb+0x151)[0x69f3c1]
      /local/mongo/bin/mongos[0x573b22]
      /lib64/libc.so.6(exit+0xe5)[0x301ac333a5]
      /local/mongo/bin/mongos[0x52e29b]
      /lib64/libc.so.6[0x301ac302d0]
      /lib64/libc.so.6(gsignal+0x35)[0x301ac30265]
      /lib64/libc.so.6(abort+0x110)[0x301ac31d10]
      /lib64/libc.so.6(__assert_fail+0xf6)[0x301ac296e6]
      /local/mongo/bin/mongos(_ZN5mongo18DBClientReplicaSet11checkMasterEv+0x4b3)[0x5517c3]
      /local/mongo/bin/mongos(_ZN5mongo18DBClientReplicaSet7findOneERKSsRKNS_5QueryEPKNS_7BSONObjEi+0x128)[0x552208]
      /local/mongo/bin/mongos(_ZN5mongo20DBClientWithCommands10runCommandERKSsRKNS_7BSONObjERS3_i+0x8c)[0x54a10c]
      /local/mongo/bin/mongos(ZN5mongo20DBClientWithCommands13simpleCommandERKSsPNS_7BSONObjES2+0x12a)[0x53fcca]
      /local/mongo/bin/mongos(_ZN5mongo17ClientConnections7releaseERKSsPNS_12DBClientBaseE+0x10e)[0x577eae]
      /local/mongo/bin/mongos(_ZN5boost19thread_specific_ptrIN5mongo17ClientConnectionsEE11delete_dataclEPv+0xac)[0x5789dc]
      /local/mongo/bin/mongos(tls_destructor+0xb1)[0x69dae1]
      ===
      Thu May 19 02:04:39 CursorCache at shutdown - sharded: 1 passthrough: 0

      Looking at the rs.status() one of the Replica Sets was in a bad state

      xxxxxSet6:PRIMARY> rs.status();
      {
      "set" : "xxxxxSet6",
      "date" : ISODate("2011-05-19T03:45:15Z"),
      "myState" : 1,
      "members" : [
      {
      "_id" : 0,
      "name" : "us0101amd106.xxxxx.gbl:27017",
      "health" : 1,
      "state" : 3,
      "stateStr" : "RECOVERING",
      "uptime" : 535081,
      "optime" :

      { "t" : 1305494831000, "i" : 43 }

      ,
      "optimeDate" : ISODate("2011-05-15T21:27:11Z"),
      "lastHeartbeat" : ISODate("2011-05-19T03:45:14Z"),
      "errmsg" : "error RS102 too stale to catch up"
      },
      {
      "_id" : 1,
      "name" : "us0101amd206",
      "health" : 1,
      "state" : 1,
      "stateStr" : "PRIMARY",
      "optime" :

      { "t" : 1305776715000, "i" : 3 }

      ,
      "optimeDate" : ISODate("2011-05-19T03:45:15Z"),
      "self" : true
      },
      {
      "_id" : 2,
      "name" : "us0101amd306",
      "health" : 1,
      "state" : 3,
      "stateStr" : "RECOVERING",
      "uptime" : 535372,
      "optime" :

      { "t" : 1305528927000, "i" : 13 }

      ,
      "optimeDate" : ISODate("2011-05-16T06:55:27Z"),
      "lastHeartbeat" : ISODate("2011-05-19T03:45:14Z"),
      "errmsg" : "error RS102 too stale to catch up"
      }
      ],
      "ok" : 1
      }

            Assignee:
            greg_10gen Greg Studer
            Reporter:
            alvin Alvin Richards (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: