[SERVER-3188] CLONE - mongos crash with "Received signal 6" Created: 03/Jun/11  Updated: 12/Jul/16  Resolved: 03/Jun/11

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 1.8.1
Fix Version/s: 1.8.2

Type: Bug Priority: Major - P3
Reporter: Jim Rubenstein Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Tue Apr 5 09:55:34 /home/david/mongodb/latest/bin/mongos db version v1.8.1-rc1, pdfile version 4.5 starting (--help for usage)
Tue Apr 5 09:55:34 git version: c340b4882b752b9e9fdae4db2738ee502cd254e3
Tue Apr 5 09:55:34 build sys info: Linux bs-linux64.10gen.cc 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_41


Operating System: Linux
Participants:

 Description   

Tue Apr 5 09:55:33 [mongosMain] dbexit: received signal 15 rc:0 received signal 15
Tue Apr 5 09:55:34 /home/david/mongodb/latest/bin/mongos db version v1.8.1-rc1, pdfile version 4.5 starting (--help for usage)
Tue Apr 5 09:55:34 git version: c340b4882b752b9e9fdae4db2738ee502cd254e3
Tue Apr 5 09:55:34 build sys info: Linux bs-linux64.10gen.cc 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_41
Tue Apr 5 09:55:34 [websvr] web admin interface listening on port 28017
Tue Apr 5 09:55:34 [websvr] couldn't unlink socket file /tmp/mongodb-28017.sockerrno:1 Operation not permitted skipping
Tue Apr 5 09:55:34 [mongosMain] waiting for connections on port 27017
Tue Apr 5 09:55:34 [mongosMain] couldn't unlink socket file /tmp/mongodb-27017.sockerrno:1 Operation not permitted skipping
Tue Apr 5 09:55:34 [Balancer] about to contact config servers and shards
Tue Apr 5 09:55:34 [Balancer] updated set (set1) to: set1/rs1a:27018,rs1b:27018
Tue Apr 5 09:55:34 [ReplicaSetMonitorWatcher] starting
Tue Apr 5 09:55:34 [Balancer] updated set (set2) to: set2/rs2a:27018,rs2b:27018
Tue Apr 5 09:55:34 [Balancer] updated set (set3) to: set3/rs3a:27018,rs3b:27018
Tue Apr 5 09:55:34 [Balancer] config servers and shards contacted successfully
Tue Apr 5 09:55:34 [Balancer] balancer id: ad1:27017 started at Apr 5 09:55:34
Tue Apr 5 09:55:34 [LockPinger] creating dist lock ping thread for: config1:27019
Tue Apr 5 09:55:34 [conn2] creating WriteBackListener for: rs1a:27018
Tue Apr 5 09:55:34 [conn2] creating WriteBackListener for: rs1b:27018
Tue Apr 5 09:55:34 [conn2] creating WriteBackListener for: rs2a:27018
Tue Apr 5 09:55:34 [conn2] creating WriteBackListener for: rs2b:27018
Tue Apr 5 09:55:34 [conn2] creating WriteBackListener for: rs3a:27018
Tue Apr 5 09:55:34 [conn2] creating WriteBackListener for: rs3b:27018
Tue Apr 5 10:00:04 [LockPinger] dist_lock pinged successfully for: ad1:1301997334:1804289383
Tue Apr 5 10:05:04 [LockPinger] dist_lock pinged successfully for: ad1:1301997334:1804289383
Tue Apr 5 10:10:04 [LockPinger] dist_lock pinged successfully for: ad1:1301997334:1804289383
Tue Apr 5 10:15:04 [LockPinger] dist_lock pinged successfully for: ad1:1301997334:1804289383
Tue Apr 5 10:20:04 [LockPinger] dist_lock pinged successfully for: ad1:1301997334:1804289383
Tue Apr 5 10:25:05 [LockPinger] dist_lock pinged successfully for: ad1:1301997334:1804289383
Tue Apr 5 10:30:05 [LockPinger] dist_lock pinged successfully for: ad1:1301997334:1804289383
Tue Apr 5 10:35:05 [LockPinger] dist_lock pinged successfully for: ad1:1301997334:1804289383
Tue Apr 5 10:40:05 [LockPinger] dist_lock pinged successfully for: ad1:1301997334:1804289383
Tue Apr 5 10:45:05 [LockPinger] dist_lock pinged successfully for: ad1:1301997334:1804289383
Tue Apr 5 10:46:34 [conn69] warning: splitChunk failed - cmd: { splitChunk: "sd.metrics_110405", keyPattern:

{ accId: 1, sId: 1 }

, min:

{ accId: 2461, sId: 35 }

, max:

{ accId: 2845, sId: 2 }

, from: "set2/rs2a:27018,rs2b:27018", splitKeys: [

{ accId: 2596, sId: 11 }

], shardId: "sd.metrics_110405-accId_2461sId_35", configdb: "config1:27019" } result: { currMin:

{ accId: 2461, sId: 35 }

, currMax:

{ accId: 2596, sId: 11 }

, requestedMin:

{ accId: 2461, sId: 35 }

, requestedMax:

{ accId: 2845, sId: 2 }

, errmsg: "chunk boundaries are outdated (likely a split occurred)", ok: 0.0 }
Tue Apr 5 10:50:05 [LockPinger] dist_lock pinged successfully for: ad1:1301997334:1804289383
Tue Apr 5 10:55:05 [LockPinger] dist_lock pinged successfully for: ad1:1301997334:1804289383
Tue Apr 5 11:00:05 [LockPinger] dist_lock pinged successfully for: ad1:1301997334:1804289383
Tue Apr 5 11:05:05 [LockPinger] dist_lock pinged successfully for: ad1:1301997334:1804289383
Tue Apr 5 11:10:05 [LockPinger] dist_lock pinged successfully for: ad1:1301997334:1804289383
Tue Apr 5 11:15:05 [LockPinger] dist_lock pinged successfully for: ad1:1301997334:1804289383
Tue Apr 5 11:20:05 [LockPinger] dist_lock pinged successfully for: ad1:1301997334:1804289383
Tue Apr 5 11:25:05 [LockPinger] dist_lock pinged successfully for: ad1:1301997334:1804289383
Tue Apr 5 11:29:09 [conn124] ns: sd.servers ClusteredCursor::query ShardConnection had to change attempt: 0
Tue Apr 5 11:29:09 [conn124] ns: sd.metricsLatest ClusteredCursor::query ShardConnection had to change attempt: 0
Tue Apr 5 11:29:09 [conn124] ns: sd.alertsTriggered ClusteredCursor::query ShardConnection had to change attempt: 0
Tue Apr 5 11:30:05 [LockPinger] dist_lock pinged successfully for: ad1:1301997334:1804289383
Tue Apr 5 11:35:06 [LockPinger] dist_lock pinged successfully for: ad1:1301997334:1804289383
Tue Apr 5 11:38:40 [conn128] autosplitted sd.metrics_110405 shard: ns:sd.metrics_110405 at: shard2:set2/rs2a:27018,rs2b:27018 lastmod: 1|17 min:

{ accId: 2959, sId: 12 }

max:

{ accId: 3177, sId: 6 }

on:

{ accId: 3441, sId: 2 }

(splitThreshold 209715200)
Tue Apr 5 11:40:06 [LockPinger] dist_lock pinged successfully for: ad1:1301997334:1804289383
Tue Apr 5 11:45:06 [LockPinger] dist_lock pinged successfully for: ad1:1301997334:1804289383
Tue Apr 5 11:50:06 [LockPinger] dist_lock pinged successfully for: ad1:1301997334:1804289383
Tue Apr 5 11:55:06 [LockPinger] dist_lock pinged successfully for: ad1:1301997334:1804289383
Tue Apr 5 12:00:06 [LockPinger] dist_lock pinged successfully for: ad1:1301997334:1804289383
Tue Apr 5 12:05:06 [LockPinger] dist_lock pinged successfully for: ad1:1301997334:1804289383
Tue Apr 5 12:10:06 [LockPinger] dist_lock pinged successfully for: ad1:1301997334:1804289383
Tue Apr 5 12:15:06 [LockPinger] dist_lock pinged successfully for: ad1:1301997334:1804289383
Tue Apr 5 12:20:06 [LockPinger] dist_lock pinged successfully for: ad1:1301997334:1804289383
Tue Apr 5 12:25:06 [LockPinger] dist_lock pinged successfully for: ad1:1301997334:1804289383
Tue Apr 5 12:30:06 [LockPinger] dist_lock pinged successfully for: ad1:1301997334:1804289383
Tue Apr 5 12:35:06 [LockPinger] dist_lock pinged successfully for: ad1:1301997334:1804289383
Tue Apr 5 12:40:07 [LockPinger] dist_lock pinged successfully for: ad1:1301997334:1804289383
Tue Apr 5 12:45:07 [LockPinger] dist_lock pinged successfully for: ad1:1301997334:1804289383
Tue Apr 5 12:50:07 [LockPinger] dist_lock pinged successfully for: ad1:1301997334:1804289383
Tue Apr 5 12:55:07 [LockPinger] dist_lock pinged successfully for: ad1:1301997334:1804289383
Tue Apr 5 12:56:16 [conn124] ns: sd.servers ClusteredCursor::query ShardConnection had to change attempt: 0
Tue Apr 5 12:56:16 [conn124] ns: sd.metricsLatest ClusteredCursor::query ShardConnection had to change attempt: 0
Tue Apr 5 12:56:16 [conn124] ns: sd.alertsTriggered ClusteredCursor::query ShardConnection had to change attempt: 0
Tue Apr 5 12:56:16 [conn124] AssertionException in process: ns: sd.alertsLog doWRite
Tue Apr 5 12:56:24 [conn124] ns: sd.users ClusteredCursor::query ShardConnection had to change attempt: 0
Tue Apr 5 12:59:43 [conn124] ns: sd.usersPhones ClusteredCursor::query ShardConnection had to change attempt: 0
Tue Apr 5 13:00:07 [LockPinger] dist_lock pinged successfully for: ad1:1301997334:1804289383
Tue Apr 5 13:05:07 [LockPinger] dist_lock pinged successfully for: ad1:1301997334:1804289383
Received signal 6
Backtrace: 0x52e235 0x3b71e302d0 0x3b71e30265 0x3b71e31d10 0x3b71e296e6 0x697f22 0x5035ab 0x504e64 0x69ec30 0x3b7260673d 0x3b71ed3f6d
/home/david/mongodb/latest/bin/mongos(_ZN5mongo17printStackAndExitEi+0x75)[0x52e235]
/lib64/libc.so.6[0x3b71e302d0]
/lib64/libc.so.6(gsignal+0x35)[0x3b71e30265]
/lib64/libc.so.6(abort+0x110)[0x3b71e31d10]
/lib64/libc.so.6(__assert_fail+0xf6)[0x3b71e296e6]
/home/david/mongodb/latest/bin/mongos(_ZN5mongo17WriteBackListener3runEv+0x19d2)[0x697f22]
/home/david/mongodb/latest/bin/mongos(_ZN5mongo13BackgroundJob7jobBodyEN5boost10shared_ptrINS0_9JobStatusEEE+0x12b)[0x5035ab]
/home/david/mongodb/latest/bin/mongos(_ZN5boost6detail11thread_dataINS_3_bi6bind_tIvNS_4_mfi3mf1IvN5mongo13BackgroundJobENS_10shared_ptrINS7_9JobStatusEEEEENS2_5list2INS2_5valueIPS7_EENSD_ISA_EEEEEEE3runEv+0x74)[0x504e64]
/home/david/mongodb/latest/bin/mongos(thread_proxy+0x80)[0x69ec30]
/lib64/libpthread.so.0[0x3b7260673d]
/lib64/libc.so.6(clone+0x6d)[0x3b71ed3f6d]
===



 Comments   
Comment by Eliot Horowitz (Inactive) [ 10/Aug/11 ]

We don't do rpms for RC, just tarballs, see: http://www.mongodb.org/downloads

Comment by Edward Wei [ 10/Aug/11 ]

@Eliot,
I don't see any package for 1.8.3-rc0 on http://downloads-distro.mongodb.org/repo/redhat/os/x86_64/
there're only 1.8.2 and 1.9.1. did I misunderstand what you mean? thank you.

Comment by Eliot Horowitz (Inactive) [ 09/Aug/11 ]

@edward - just to followup - there is a 1.8.3-rc0 package up

Comment by Edward Wei [ 01/Aug/11 ]

@Eliot,
thank you for quick reply, will you have plan to package rpm for 1.8.3 rc0? cuz we manage our deployment all by rpms. thanks again.

Comment by Eliot Horowitz (Inactive) [ 29/Jul/11 ]

@edward - that looks different and something similar to something fixed in 1.8.3
can you try that and if it happens again open a new ticket

Comment by Edward Wei [ 29/Jul/11 ]

Hi All,
Still receive signal 6 with version 1.8.2 : (we use sharding+replicaset, multiple mongos)

Fri Jul 29 10:06:47 [conn111] end connection 10.42.91.33:34169
Received signal 6
Backtrace: 0x52f8f5 0x3b776302d0 0x3b77630265 0x3b77631d10 0x3b776296e6 0x69d485 0x50454b 0x505e04 0x6a50a0 0x3b7820673d 0x3b776d3f6d
/usr/bin/mongos(_ZN5mongo17printStackAndExitEi+0x75)[0x52f8f5]
/lib64/libc.so.6[0x3b776302d0]
/lib64/libc.so.6(gsignal+0x35)[0x3b77630265]
/lib64/libc.so.6(abort+0x110)[0x3b77631d10]
/lib64/libc.so.6(__assert_fail+0xf6)[0x3b776296e6]
/usr/bin/mongos(_ZN5mongo17WriteBackListener3runEv+0x1c15)[0x69d485]
/usr/bin/mongos(_ZN5mongo13BackgroundJob7jobBodyEN5boost10shared_ptrINS0_9JobStatusEEE+0x12b)[0x50454b]
/usr/bin/mongos(_ZN5boost6detail11thread_dataINS_3_bi6bind_tIvNS_4_mfi3mf1IvN5mongo13BackgroundJobENS_10shared_ptrINS7_9JobStatusEEEEENS2_5list2INS2_5valueIPS7_EENSD_ISA_EEEEEEE3runEv+0x74)[0x505e04]
/usr/bin/mongos(thread_proxy+0x80)[0x6a50a0]
/lib64/libpthread.so.0[0x3b7820673d]
/lib64/libc.so.6(clone+0x6d)[0x3b776d3f6d]
===

thank you for help.

Comment by Jim Rubenstein [ 03/Jun/11 ]

Seems to be fixed in 1.8.2-rc3 (at least, we haven't run into the error yet). awesome.

Comment by Eliot Horowitz (Inactive) [ 03/Jun/11 ]

Can you try 1.8.2-rc3

Comment by Jim Rubenstein [ 03/Jun/11 ]

Cloned this bug from https://jira.mongodb.org/browse/SERVER-2900

Still getting this error in the error log for the mongos server version 1.8.2-rc1.

Relevant log part:

Fri Jun 3 08:25:02 [conn14] got not master for: goyle:27018
Received signal 6
Backtrace: 0x52f4f5 0x7f6e5cdc8100 0x7f6e5cdc8095 0x7f6e5cdc9af0 0x7f6e5cdc12df 0x552d43 0x553788 0x54b6bc 0x539569 0x66bae4 0x63d9c5 0x57d60c 0x635552 0x666cfc 0x67ba67 0x58027c 0x6a2d10 0x7f6e5d8983f7 0x7f6e5ce6dbbd
mongos(_ZN5mongo17printStackAndExitEi+0x75)[0x52f4f5]
/lib/libc.so.6[0x7f6e5cdc8100]
/lib/libc.so.6(gsignal+0x35)[0x7f6e5cdc8095]
/lib/libc.so.6(abort+0x110)[0x7f6e5cdc9af0]
/lib/libc.so.6(__assert_fail+0xef)[0x7f6e5cdc12df]
mongos(_ZN5mongo18DBClientReplicaSet11checkMasterEv+0x4b3)[0x552d43]
mongos(_ZN5mongo18DBClientReplicaSet7findOneERKSsRKNS_5QueryEPKNS_7BSONObjEi+0x128)[0x553788]
mongos(_ZN5mongo20DBClientWithCommands10runCommandERKSsRKNS_7BSONObjERS3_i+0x8c)[0x54b6bc]
mongos(_ZN5mongo20DBClientWithCommands20getLastErrorDetailedEv+0x69)[0x539569]
mongos(_ZN5mongo10ClientInfo12getLastErrorERKNS_7BSONObjERNS_14BSONObjBuilderEb+0x1fc4)[0x66bae4]
mongos(_ZN5mongo11dbgrid_cmds23CmdShardingGetLastError3runERKSsRNS_7BSONObjERSsRNS_14BSONObjBuilderEb+0x55)[0x63d9c5]
mongos(_ZN5mongo7Command20runAgainstRegisteredEPKcRNS_7BSONObjERNS_14BSONObjBuilderE+0x67c)[0x57d60c]
mongos(_ZN5mongo14SingleStrategy7queryOpERNS_7RequestE+0x262)[0x635552]
mongos(_ZN5mongo7Request7processEi+0x29c)[0x666cfc]
mongos(_ZN5mongo21ShardedMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x77)[0x67ba67]
mongos(_ZN5mongo3pms9threadRunEPNS_13MessagingPortE+0x34c)[0x58027c]
mongos(thread_proxy+0x80)[0x6a2d10]
/lib/libpthread.so.0[0x7f6e5d8983f7]
/lib/libc.so.6(clone+0x6d)[0x7f6e5ce6dbbd]
===
Fri Jun 3 08:25:02 CursorCache at shutdown - sharded: 1 passthrough: 2

Generated at Thu Feb 08 03:02:21 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.