[SERVER-3044] signal 11 in mongos Created: 05/May/11  Updated: 12/Jul/16  Resolved: 25/May/11

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 1.8.1
Fix Version/s: 1.8.2

Type: Bug Priority: Major - P3
Reporter: ofer samocha Assignee: Greg Studer
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

Got the following crash after issue in SERVER-3043:

Received signal 11
Backtrace: 0x52e235 0x2aefd8f59070 0x2aefd8584219 0x54dcc8 0x61e5c4 0x61e6b8 0x5774d2 0x575630 0x575b31 0x62d61e 0x6312a3 0x66432c 0x6761c7 0x57ea3c 0x69ec30 0x2aefd85821
b5 0x2aefd8ff636d
/usr/bin/mongos(_ZN5mongo17printStackAndExitEi+0x75)[0x52e235]
/lib64/libc.so.6[0x2aefd8f59070]
/lib64/libpthread.so.0(pthread_mutex_lock+0x19)[0x2aefd8584219]
/usr/bin/mongos(_ZNK5mongo17ReplicaSetMonitor8containsERKSs+0x28)[0x54dcc8]
/usr/bin/mongos(_ZNK5mongo5Shard12containsNodeERKSs+0x94)[0x61e5c4]
/usr/bin/mongos(_ZN5mongo5Shard12isAShardNodeERKSs+0xe8)[0x61e6b8]
/usr/bin/mongos(_ZN5mongo17ClientConnections13checkVersionsERKSs+0x1e2)[0x5774d2]
/usr/bin/mongos(_ZN5mongo15ShardConnection5_initEv+0x2d0)[0x575630]
/usr/bin/mongos(_ZN5mongo15ShardConnectionC1ERKNS_5ShardERKSs+0xa1)[0x575b31]
/usr/bin/mongos(_ZN5mongo8Strategy7doQueryERNS_7RequestERKNS_5ShardE+0x4e)[0x62d61e]
/usr/bin/mongos(_ZN5mongo14SingleStrategy7queryOpERNS_7RequestE+0x4a3)[0x6312a3]
/usr/bin/mongos(_ZN5mongo7Request7processEi+0x29c)[0x66432c]
/usr/bin/mongos(_ZN5mongo21ShardedMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x77)[0x6761c7]
/usr/bin/mongos(_ZN5mongo3pms9threadRunEPNS_13MessagingPortE+0x34c)[0x57ea3c]
/usr/bin/mongos(thread_proxy+0x80)[0x69ec30]
/lib64/libpthread.so.0[0x2aefd85821b5]
/lib64/libc.so.6(clone+0x6d)[0x2aefd8ff636d]
===



 Comments   
Comment by Eliot Horowitz (Inactive) [ 30/Nov/11 ]

@alan - can you open a new thread with the full mongos log?

Comment by Alan Shang [ 30/Nov/11 ]

Got this in 2.0.1. This is the first time I see this.

Received signal 11
Backtrace: 0x5521f5 0x37cf430280 0x70f8f0 0x7e8dba 0x52992f 0x52b9e4 0x7feeb0 0x37d0006367 0x37cf4d2f7d
/usr/bin/mongos(_ZN5mongo17printStackAndExitEi+0x75)[0x5521f5]
/lib64/libc.so.6[0x37cf430280]
/usr/bin/mongos(_ZN5mongo8DBConfig15getChunkManagerERKSsbb+0x750)[0x70f8f0]
/usr/bin/mongos(_ZN5mongo17WriteBackListener3runEv+0x171a)[0x7e8dba]
/usr/bin/mongos(_ZN5mongo13BackgroundJob7jobBodyEN5boost10shared_ptrINS0_9JobStatusEEE+0xbf)[0x52992f]
/usr/bin/mongos(_ZN5boost6detail11thread_dataINS_3_bi6bind_tIvNS_4_mfi3mf1IvN5mongo13BackgroundJobENS_10shared_ptrINS7_9JobStatusEEEEENS2_5list2INS2_5valueIPS7_EENSD_ISA_EEEEEEE3runEv+0x74)[0x52b9e4]
/usr/bin/mongos(thread_proxy+0x80)[0x7feeb0]
/lib64/libpthread.so.0[0x37d0006367]
/lib64/libc.so.6(clone+0x6d)[0x37cf4d2f7d]

Comment by Greg Studer [ 25/May/11 ]

Reopen if you see the issue occur again - but pretty sure this is fixed in 1.8.2

Comment by Greg Studer [ 23/May/11 ]

The newer version 1.8.2 should solve this problem, I would recommend you upgrade to rc2 or wait until 1.8.2 is fully released.

Comment by Paul Mokbel [ 23/May/11 ]

I've setup a second "mongos" in an attempt to mitigate the issue under intense load. I've split up the load among the two mongos and it has slowed the rate at which it crashes at the very least.

Comment by Paul Mokbel [ 19/May/11 ]

I'm having a similar issue .. This just started happening today after we increased load significantly. It has crashed twice so far today. Any ideas ?

Backtrace: ^@nal 11
Backtrace: 0x37d54300x37d547291e 0x37dccbd17d 0x4c0e02 0x4c0x67ac3e ^@7d 0x534f73 ^@0x66706d 0x578354 0x5786c6 0x586644 0x0x66fa00x6a22b0 0x639585 0x0x67ae10x0x5817a
^@37d54d30x6a22b0 0x37d600673d 0x37d54d3f6d
bin/mongos(_ZN5mongo17printStackAndExitEi+0x75)[0x5319a5]
bin/mongos(_ZN5mongo17printStackAndExitEi+0x75)[0x5319a5]
/lib64/libc.so.6[0x37d54302d0]
/lib64/libc.so.6[0x37d54302d0]
/lib64/libc.so.6[0x37d547291e]
/usr/lib64/libstdc++.so.6(ZSt29_Rb_tree_insert_and_rebalancebPSt18_Rb_tree_node_baseS0_RS+0x5a)[0x37dcc64dda]
/lib64/libc.so.6(__libc_malloc+0x6e)[0x37d5474cde]
bin/mongos(_ZNSt8_Rb_treeISsSsSt9_IdentityISsESt4lessISsESaISsEE9_M_insertEPSt18_Rb_tree_node_baseS7_RKSs+0x62)[0x4c0e02]
/usr/lib64/libstdc++.so.6(_Znwm+0x1d)[0x37dccbd17d]
bin/mongos(_ZNSt8_Rb_treeISsSsSt9_IdentityISsESt4lessISsESaISsEE13insert_uniqueERKSs+0x143)[0x4c0fe3]
/usr/lib64/libstdc++.so.6(_ZNSs4_Rep9_S_createEmmRKSaIcE+0x21)[0x37dcc9b801]
bin/mongos(_ZN5mongo10ClientInfo8addShardERKSs+0x24)[0x668f24]
/usr/lib64/libstdc++.so.6[0x37dcc9c555]
/usr/lib64/libstdc++.so.6(_ZNSsC1ERKSsmm+0x38)[0x37dcc9c688]
bin/mongos(_ZN5mongo22ShardingConnectionHook11onHandedOutEPNS_12DBClientBaseE+0x2e)[0x67ac3e]
bin/mongos(_ZN5mongo4Grid11getDBConfigESsbRKSs+0x5f)[0x5fb49f]
bin/mongos(_ZN5mongo16DBConnectionPool11onHandedOutEPNS_12DBClientBaseE+0x43)[0x534f73]
bin/mongos(_ZN5mongo7Request5resetEb+0xbd)[0x66706d]
bin/mongos(_ZN5mongo15ShardConnection5_initEv+0x1b4)[0x578354]
bin/mongos(_ZN5mongo21ShardedMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortE+0x12b)[0x67adfb]
bin/mongos(ZN5mongo15ShardConnectionC1ERKSsS2+0x76)[0x5786c6]
bin/mongos(_ZN5mongo3pms9threadRunEPNS_13MessagingPortE+0x272)[0x5817a2]
bin/mongos(_ZN5mongo15ClusteredCursor5queryERKSsiNS_7BSONObjEi+0x124)[0x586644]
bin/mongos(thread_proxy+0x80)[0x6a22b0]
bin/mongos(_ZN5mongo27SerialServerClusteredCursor4moreEv+0x134)[0x589194]
/lib64/libpthread.so.0[0x37d600673d]

... The log is very very long. Running mongos 1.8.0

Comment by ofers@imesh.com [ 10/May/11 ]

This bug is few days old. It just happend after mongos restart.

Comment by Eliot Horowitz (Inactive) [ 10/May/11 ]

Is it the same as last time?

Comment by ofers@imesh.com [ 10/May/11 ]

Stack only

Comment by Eliot Horowitz (Inactive) [ 10/May/11 ]

Is there a stack or an assert?

Comment by ofer samocha [ 09/May/11 ]

this time mongos restarted at 07:07:47 and crashed at 07:08:03
config server was up at this time, no more relevant data in the log

lines like:

Thu May 5 07:07:48 [LockPinger] creating dist lock ping thread for: amdbm001:10001,amdbm003:10001,amdbm005:10001
Thu May 5 07:07:48 [LockPinger] SyncClusterConnection connecting to [amdbm001:10001]
Thu May 5 07:07:48 [LockPinger] SyncClusterConnection connecting to [amdbm003:10001]
Thu May 5 07:07:48 [LockPinger] SyncClusterConnection connecting to [amdbm005:10001]
Thu May 5 07:07:48 [conn2] creating WriteBackListener for: amdbm002:10000
Thu May 5 07:07:48 [conn2] creating WriteBackListener for: amdbm001:10000
Thu May 5 07:07:48 [conn2] creating WriteBackListener for: amdbm019:10000
Thu May 5 07:07:48 [conn2] creating WriteBackListener for: amdbm020:10000
Thu May 5 07:07:48 [conn3] creating WriteBackListener for: amdbm021:10000
Thu May 5 07:07:48 [conn3] creating WriteBackListener for: amdbm022:10000
...

Thu May 5 07:07:48 [Balancer] config servers and shards contacted successfully
Thu May 5 07:07:48 [Balancer] balancer id: SN10:27017 started at May 5 07:07:48
Thu May 5 07:07:48 [conn37] end connection 127.0.0.1:37903
Thu May 5 07:08:03 [mongosMain] connection accepted from 10.196.238.129:39876 #51
then the crash

Comment by Greg Studer [ 09/May/11 ]

Not sure exactly what you mean by "after issue" - was this seen after a mongos restart, and were the config servers back online by this point? Also, do you have the logs leading up to the crash, between those in SERVER-3043 and this error?

Generated at Thu Feb 08 03:01:55 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.