[SERVER-28001] Mongodb Crashed with the Got signal: 6 (Aborted) Created: 14/Feb/17  Updated: 31/May/17  Resolved: 21/Mar/17

Status: Closed
Project: Core Server
Component/s: Admin
Affects Version/s: 3.0.4
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: Abhishek Manocha Assignee: Mark Agarunov
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Screenshot at Feb 14 18-48-49.png     Text File arbitar logs.txt     Text File primary logs.txt     Text File secondary logs.txt    
Issue Links:
Duplicate
is duplicated by SERVER-28002 CLONE - Mongodb Crashed with the Got ... Closed
Related
related to SERVER-25659 InputStreamSecureRandom should open t... Closed
Participants:

 Description   

We have seen repeatedly the mongdb crashing running on

Ubuntu
Master Slave Arbiter set up
Got signal: 6 (Aborted)

Most recent stacktrace is here:

2017-02-14T11:17:11.267+0000 F -        [conn260245] Got signal: 6 (Aborted).
 
 0xf605f9 0xf5fc72 0xf60026 0x7f6a52f01d40 0x7f6a52f01cc9 0x7f6a52f050d8 0xda2869 0x88da32 0x8dd89d 0x8de5e1 0x8b3627 0x8d1e77 0x8d3936 0x9bebe4 0x9bfb6d 0x9c087b 0xb94a3a 0xaa48f0 0x7e9c4d 0xf1dabb 0x7f6a53ec4182 0x7f6a52fc547d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"B605F9"},{"b":"400000","o":"B5FC72"},{"b":"400000","o":"B60026"},{"b":"7F6A52ECB000","o":"36D40"},{"b":"7F6A52ECB000","o":"36CC9"},{"b":"7F6A52ECB000","o":"3A0D8"},{"b":"400000","o":"9A2869"},{"b":"400000","o":"48DA32"},{"b":"400000","o":"4DD89D"},{"b":"400000","o":"4DE5E1"},{"b":"400000","o":"4B3627"},{"b":"400000","o":"4D1E77"},{"b":"400000","o":"4D3936"},{"b":"400000","o":"5BEBE4"},{"b":"400000","o":"5BFB6D"},{"b":"400000","o":"5C087B"},{"b":"400000","o":"794A3A"},{"b":"400000","o":"6A48F0"},{"b":"400000","o":"3E9C4D"},{"b":"400000","o":"B1DABB"},{"b":"7F6A53EBC000","o":"8182"},{"b":"7F6A52ECB000","o":"FA47D"}],"processInfo":{ "mongodbVersion" : "3.0.4", "gitVersion" : "0481c958daeb2969800511e7475dc66986fa9ed5", "uname" : { "sysname" : "Linux", "release" : "3.13.0-48-generic", "version" : "#80-Ubuntu SMP Thu Mar 12 11:16:15 UTC 2015", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000" }, { "b" : "7FFFD6C41000", "elfType" : 3 }, { "b" : "7F6A53EBC000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3 }, { "b" : "7F6A53CB4000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3 }, { "b" : "7F6A53AB0000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3 }, { "b" : "7F6A537AC000", "path" : "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", "elfType" : 3 }, { "b" : "7F6A534A6000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3 }, { "b" : "7F6A53290000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3 }, { "b" : "7F6A52ECB000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3 }, { "b" : "7F6A540DA000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3 } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf605f9]
 mongod(+0xB5FC72) [0xf5fc72]
 mongod(+0xB60026) [0xf60026]
 libc.so.6(+0x36D40) [0x7f6a52f01d40]
 libc.so.6(gsignal+0x39) [0x7f6a52f01cc9]
 libc.so.6(abort+0x148) [0x7f6a52f050d8]
 mongod(_ZN5mongo12SecureRandom6createEv+0x1B9) [0xda2869]
 mongod(_ZN5mongo5scram19generateCredentialsERKSsi+0x22) [0x88da32]
 mongod(_ZN5mongo31SaslSCRAMSHA1ServerConversation10_firstStepERSt6vectorISsSaISsEEPSs+0x150D) [0x8dd89d]
 mongod(_ZN5mongo31SaslSCRAMSHA1ServerConversation4stepERKNS_10StringDataEPSs+0x2F1) [0x8de5e1]
 mongod(_ZN5mongo31NativeSaslAuthenticationSession4stepERKNS_10StringDataEPSs+0x27) [0x8b3627]
 mongod(+0x4D1E77) [0x8d1e77]
 mongod(+0x4D3936) [0x8d3936]
 mongod(_ZN5mongo12_execCommandEPNS_16OperationContextEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x34) [0x9bebe4]
 mongod(_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_iPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0xC1D) [0x9bfb6d]
 mongod(_ZN5mongo12_runCommandsEPNS_16OperationContextEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x28B) [0x9c087b]
 mongod(_ZN5mongo8runQueryEPNS_16OperationContextERNS_7MessageERNS_12QueryMessageERKNS_15NamespaceStringERNS_5CurOpES3_+0x77A) [0xb94a3a]
 mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0xB10) [0xaa48f0]
 mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0xDD) [0x7e9c4d]
 mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x34B) [0xf1dabb]
 libpthread.so.0(+0x8182) [0x7f6a53ec4182]
 libc.so.6(clone+0x6D) [0x7f6a52fc547d]
-----  END BACKTRACE  -----

On the EC2 (we run this in AWS):
We see the following Network In and Out as fishy.



 Comments   
Comment by Mark Agarunov [ 21/Mar/17 ]

Hello akmanocha

We haven’t heard back from you for some time, so I’m going to mark this ticket as resolved. If this is still an issue for you, please provide additional information and we will reopen the ticket.

Thanks,
Mark

Comment by Mark Agarunov [ 28/Feb/17 ]

Hello akmanocha,

My apologies, I overlooked the fact that you are using Mongodb version 3.0.4, which does not generate the diagnostic data since that was implemented starting with version 3.2. My recommendation would be to upgrade to a newer version, as there have been many fixes implemented since 3.0.4 and the behavior you're seeing may no longer be an issue in a more recent release. Alternatively, if you are unable to upgrade, please run the following commands and provide the ss.log and iostat.log files that are created:

delay=1
mongo --eval "while(true) {print(JSON.stringify(db.serverStatus({tcmalloc:true}))); sleep(1000*${delay:?})}" >ss.log &
iostat -k -t -x ${delay:?} >iostat.log &

Please leave this running until the issue happens again so that there is a complete log.

Thanks,
Mark

Comment by Abhishek Manocha [ 28/Feb/17 ]

What is diagnostic.data directory? I am not aware of it. What's the default for the same?

Comment by Mark Agarunov [ 21/Feb/17 ]

Hello akmanocha,

Thank you for the additional information and my apologies for the delay. We are still investigating this issue, however I suspect this may be related to SERVER-25659, which could cause the mongod process to hit the open files limit in some situations. To further investigate the behavior, please archive (tar or zip) the $dbpath/diagnostic.data directory and attach it to this ticket. This will give us some additional insight as to what may be causing the issue.

Thanks,
Mark

Comment by Abhishek Manocha [ 21/Feb/17 ]

Hey no update on this?

Should I clone this / make it to a bug?

Comment by Abhishek Manocha [ 15/Feb/17 ]

Hi Mark,

Thanks for the input. How can I know that this is the number of open files issue? How do you get to that I mean if you can share the reasoning.

My ulimit for mongouser (this specific mongo process owner) is 64000 hard and soft
And I suspect otherwise too. Please find attached the further logs.

In the arbitar logs.txt attached. The very first line
2017-02-14T11:08:41.373+0000 I REPL [ReplicationExecutor] ip-10-2-4-15:27017 is trying to elect itself but 10.2.3.17:27017 is already primary

Can you pleas help me what this line means? Why secondary wants to become primary suddenly?
If you see the primary logs.txt at that time. I can't see anything wrong.
Then there are unable to connect err in arbitar logs to primary. But again if you see primary logs.txt there are queries executing there. So kind of puzzling.

And finally both go down at around 11:17 (9 mins later) with Got signal: 6 (Aborted)

It can be open files issue, but root cause is not clear to me.

Thanks

Comment by Mark Agarunov [ 14/Feb/17 ]

Hello akmanocha,

Thank you for the report. Looking over the provided output, this appears to be due to the number of open files being greater than what is allowed by your ulimits configuration. Please try increasing this limit as described in the documentation to see if this resolves the issue you are seeing.

If this behavior is still present after increasing the open file limits, please provide the full logs from mongod and we will continue investigating this behavior.

Thanks,
Mark

Generated at Thu Feb 08 04:16:50 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.