[SERVER-3120] mongos core dumps when mongodump is dumping out from a replicaset where 2 out of 3 are in RECOVERING state Created: 19/May/11  Updated: 12/Jul/16  Resolved: 31/May/11

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 1.8.1
Fix Version/s: 1.8.2

Type: Bug Priority: Major - P3
Reporter: Alvin Richards (Inactive) Assignee: Greg Studer
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
Operating System: ALL
Participants:

 Description   

Problem:
During a mongodump the mongos process that it was connected to died. In the logs the following was seen

hu May 19 02:01:07 [LockPinger] dist_lock pinged successfully for: us0101aej024.tangome.gbl:1305761193:1804289383
Thu May 19 02:04:21 [mongosMain] connection accepted from 127.0.0.1:39876 #8
Thu May 19 02:04:33 [conn8] got not master for: us0101amd206
Thu May 19 02:04:39 [conn8] end connection 127.0.0.1:39876
Received signal 6
Backtrace: 0x52e235 0x301ac302d0 0x301ac30265 0x301ac31d10 0x301ac296e6 0x5517c3 0x552208 0x54a10c 0x53fcca 0x577eae 0x5789dc 0x69dae1 0x69ec38 0x301b40673d 0x301acd3f6d
/local/mongo/bin/mongos(_ZN5mongo17printStackAndExitEi+0x75)[0x52e235]
/lib64/libc.so.6[0x301ac302d0]
/lib64/libc.so.6(gsignal+0x35)[0x301ac30265]
/lib64/libc.so.6(abort+0x110)[0x301ac31d10]
/lib64/libc.so.6(__assert_fail+0xf6)[0x301ac296e6]
/local/mongo/bin/mongos(_ZN5mongo18DBClientReplicaSet11checkMasterEv+0x4b3)[0x5517c3]
/local/mongo/bin/mongos(_ZN5mongo18DBClientReplicaSet7findOneERKSsRKNS_5QueryEPKNS_7BSONObjEi+0x128)[0x552208]
/local/mongo/bin/mongos(_ZN5mongo20DBClientWithCommands10runCommandERKSsRKNS_7BSONObjERS3_i+0x8c)[0x54a10c]
/local/mongo/bin/mongos(ZN5mongo20DBClientWithCommands13simpleCommandERKSsPNS_7BSONObjES2+0x12a)[0x53fcca]
/local/mongo/bin/mongos(_ZN5mongo17ClientConnections7releaseERKSsPNS_12DBClientBaseE+0x10e)[0x577eae]
/local/mongo/bin/mongos(_ZN5boost19thread_specific_ptrIN5mongo17ClientConnectionsEE11delete_dataclEPv+0xac)[0x5789dc]
/local/mongo/bin/mongos(tls_destructor+0xb1)[0x69dae1]
/local/mongo/bin/mongos(thread_proxy+0x88)[0x69ec38]
/lib64/libpthread.so.0[0x301b40673d]
/lib64/libc.so.6(clone+0x6d)[0x301acd3f6d]
===
Received signal 11
Backtrace: 0x52e235 0x301ac302d0 0x53fc8d 0x577eae 0x5789dc 0x69f3c1 0x573b22 0x301ac333a5 0x52e29b 0x301ac302d0 0x301ac30265 0x301ac31d10 0x301ac296e6 0x5517c3 0x552208 0x54a10c 0x53fcca 0x577eae 0x5789dc 0x69dae1
/local/mongo/bin/mongos(_ZN5mongo17printStackAndExitEi+0x75)[0x52e235]
/lib64/libc.so.6[0x301ac302d0]
/local/mongo/bin/mongos(ZN5mongo20DBClientWithCommands13simpleCommandERKSsPNS_7BSONObjES2+0xed)[0x53fc8d]
/local/mongo/bin/mongos(_ZN5mongo17ClientConnections7releaseERKSsPNS_12DBClientBaseE+0x10e)[0x577eae]
/local/mongo/bin/mongos(_ZN5boost19thread_specific_ptrIN5mongo17ClientConnectionsEE11delete_dataclEPv+0xac)[0x5789dc]
/local/mongo/bin/mongos(_ZN5boost6detail12set_tss_dataEPKvNS_10shared_ptrINS0_20tss_cleanup_functionEEEPvb+0x151)[0x69f3c1]
/local/mongo/bin/mongos[0x573b22]
/lib64/libc.so.6(exit+0xe5)[0x301ac333a5]
/local/mongo/bin/mongos[0x52e29b]
/lib64/libc.so.6[0x301ac302d0]
/lib64/libc.so.6(gsignal+0x35)[0x301ac30265]
/lib64/libc.so.6(abort+0x110)[0x301ac31d10]
/lib64/libc.so.6(__assert_fail+0xf6)[0x301ac296e6]
/local/mongo/bin/mongos(_ZN5mongo18DBClientReplicaSet11checkMasterEv+0x4b3)[0x5517c3]
/local/mongo/bin/mongos(_ZN5mongo18DBClientReplicaSet7findOneERKSsRKNS_5QueryEPKNS_7BSONObjEi+0x128)[0x552208]
/local/mongo/bin/mongos(_ZN5mongo20DBClientWithCommands10runCommandERKSsRKNS_7BSONObjERS3_i+0x8c)[0x54a10c]
/local/mongo/bin/mongos(ZN5mongo20DBClientWithCommands13simpleCommandERKSsPNS_7BSONObjES2+0x12a)[0x53fcca]
/local/mongo/bin/mongos(_ZN5mongo17ClientConnections7releaseERKSsPNS_12DBClientBaseE+0x10e)[0x577eae]
/local/mongo/bin/mongos(_ZN5boost19thread_specific_ptrIN5mongo17ClientConnectionsEE11delete_dataclEPv+0xac)[0x5789dc]
/local/mongo/bin/mongos(tls_destructor+0xb1)[0x69dae1]
===
Thu May 19 02:04:39 CursorCache at shutdown - sharded: 1 passthrough: 0

Looking at the rs.status() one of the Replica Sets was in a bad state

xxxxxSet6:PRIMARY> rs.status();
{
"set" : "xxxxxSet6",
"date" : ISODate("2011-05-19T03:45:15Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "us0101amd106.xxxxx.gbl:27017",
"health" : 1,
"state" : 3,
"stateStr" : "RECOVERING",
"uptime" : 535081,
"optime" :

{ "t" : 1305494831000, "i" : 43 }

,
"optimeDate" : ISODate("2011-05-15T21:27:11Z"),
"lastHeartbeat" : ISODate("2011-05-19T03:45:14Z"),
"errmsg" : "error RS102 too stale to catch up"
},
{
"_id" : 1,
"name" : "us0101amd206",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"optime" :

{ "t" : 1305776715000, "i" : 3 }

,
"optimeDate" : ISODate("2011-05-19T03:45:15Z"),
"self" : true
},
{
"_id" : 2,
"name" : "us0101amd306",
"health" : 1,
"state" : 3,
"stateStr" : "RECOVERING",
"uptime" : 535372,
"optime" :

{ "t" : 1305528927000, "i" : 13 }

,
"optimeDate" : ISODate("2011-05-16T06:55:27Z"),
"lastHeartbeat" : ISODate("2011-05-19T03:45:14Z"),
"errmsg" : "error RS102 too stale to catch up"
}
],
"ok" : 1
}



 Comments   
Comment by Greg Studer [ 31/May/11 ]

Fixed by:

https://github.com/mongodb/mongo/commit/be7074a9dd95282231b2b2d34e7b61cd52978c05

Comment by Greg Studer [ 25/May/11 ]

Pretty sure this is an issue which has been addressed in 1.8.2 - upgrading to the official release once it is out (very soon) should solve this problem. Reopen if reproducible with 1.8.2.

EDIT: Reproduced with 1.8.2

Generated at Thu Feb 08 03:02:08 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.