[SERVER-22820] Crash in 3.0.5 Created: 23/Feb/16  Updated: 05/Mar/16  Resolved: 05/Mar/16

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Dharshan Rangegowda Assignee: Kelsey Schubert
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

I see a crash on my 3.0.5 wired tiger server. Here is the relevant section

2016-02-22T03:24:54.330+0000 F -        [ReplExecNetThread-1] Got signal: 6 (Aborted).
 
0xf74fc9 0xf74642 0xf749f6 0x7f4f4e13a670 0x7f4f4e13a5f7 0x7f4f4e13bce8 0xdacd29 0x89421d 0x894e27 0x8671b7 0x88f2b6 0x84d3a1 0x853dee 0x84640d 0xc3a0ad 0xc3a5d4 0xc3ad8c 0xc3b39d 0xfc3104
0x7f4f4f748dc5 0x7f4f4e1fbbdd
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"B74FC9"},{"b":"400000","o":"B74642"},{"b":"400000","o":"B749F6"},{"b":"7F4F4E105000","o":"35670"},{"b":"7F4F4E105000","o":"355F7"},{"b":"7F4F4E105000","o":"3
6CE8"},{"b":"400000","o":"9ACD29"},{"b":"400000","o":"49421D"},{"b":"400000","o":"494E27"},{"b":"400000","o":"4671B7"},{"b":"400000","o":"48F2B6"},{"b":"400000","o":"44D3A1"},{"b":"400000","
o":"453DEE"},{"b":"400000","o":"44640D"},{"b":"400000","o":"83A0AD"},{"b":"400000","o":"83A5D4"},{"b":"400000","o":"83AD8C"},{"b":"400000","o":"83B39D"},{"b":"400000","o":"BC3104"},{"b":"7F4
F4F741000","o":"7DC5"},{"b":"7F4F4E105000","o":"F6BDD"}],"processInfo":{ "mongodbVersion" : "3.0.5", "gitVersion" : "8bc4ae20708dbb493cb09338d9e7be6698e4a3a3", "uname" : { "sysname" : "Linux
", "release" : "3.10.37-47.135.amzn1.x86_64", "version" : "#1 SMP Fri Apr 18 03:28:26 UTC 2014", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "F2642585C8D
FF37F942ABBAF84DEA258651EC3C8" }, { "b" : "7FFF304DB000", "elfType" : 3, "buildId" : "A9D61CE6B4FC467291ED5B3AD5418A1F5DEA012E" }, { "b" : "7F4F4F741000", "path" : "/lib64/libpthread.so.0",
"elfType" : 3, "buildId" : "E5E575776DAD20ADE8FD0CAF17897C9D89020A87" }, { "b" : "7F4F4F4D4000", "path" : "/usr/lib64/libssl.so.10", "elfType" : 3, "buildId" : "B949A1349FF0E93409055F00BFD60
F758EE8FA02" }, { "b" : "7F4F4F0EF000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "F444116328F54797379DAA3B0FD53CF6C829F22A" }, { "b" : "7F4F4EEE7000", "path" : "/lib64/l
ibrt.so.1", "elfType" : 3, "buildId" : "42833B65941483A611C40EA7D32F56EA83EA6E93" }, { "b" : "7F4F4ECE3000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "6335077ACD51527BE9F2F18
451A88E2B7350C5B6" }, { "b" : "7F4F4E9DF000", "path" : "/usr/lib64/libstdc++.so.6", "elfType" : 3, "buildId" : "5E2AE3B2E1D3984B4DE32FE73D221D8C425516E5" }, { "b" : "7F4F4E6DD000", "path" :
"/lib64/libm.so.6", "elfType" : 3, "buildId" : "BB312C4A65B8FD830C148612CBEACEACC8B08E4F" }, { "b" : "7F4F4E4C7000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "00FA2883FB47
B1327397BBF167C52F51A723D013" }, { "b" : "7F4F4E105000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "D84E3AFDFF3E164A09C125F85B5DCABC6F545B5E" }, { "b" : "7F4F4F95D000", "path"
: "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "7B7BF8FEEF1A9C627EF90CA5C9188EFD4DA2DDD2" }, { "b" : "7F4F4DEB9000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "build
Id" : "E203354E7F907ACC8C3028FE465541B666DCFBA0" }, { "b" : "7F4F4DBD4000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "D769C8FFAF8772FDA55031ABF2E167DF2207E378" }, { "b" : "
7F4F4D9D1000", "path" : "/usr/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "5C01209C5AE1B1714F19B07EB58F2A1274B69DC8" }, { "b" : "7F4F4D79F000", "path" : "/lib64/libk5crypto.so.3", "el
fType" : 3, "buildId" : "6C2243D37143F7FD1E16ED1F6CE4D7F16C2D7EF1" }, { "b" : "7F4F4D589000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "89C6AF118B6B4FB6A73AE1813E2C8BDD722956D
1" }, { "b" : "7F4F4D37A000", "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "744272FEAAABCE629AB9E11FAA4A96AEBE8BC2B4" }, { "b" : "7F4F4D177000", "path" : "/lib64/libkeyu
tils.so.1", "elfType" : 3, "buildId" : "37A58210FA50C91E09387765408A92909468D25B" }, { "b" : "7F4F4CF5D000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "47EC2C63132D25E4FE8
3F77023DA1A66457A88F1" }, { "b" : "7F4F4CD3C000", "path" : "/usr/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "F5054DC94443326819FBF3065CFDF5E4726F57EE" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf74fc9]
 mongod(+0xB74642) [0xf74642]
 mongod(+0xB749F6) [0xf749f6]
 libc.so.6(+0x35670) [0x7f4f4e13a670]
 libc.so.6(gsignal+0x37) [0x7f4f4e13a5f7]
 libc.so.6(abort+0x148) [0x7f4f4e13bce8]
 mongod(_ZN5mongo12SecureRandom6createEv+0x1B9) [0xdacd29]
 mongod(_ZN5mongo31SaslSCRAMSHA1ClientConversation10_firstStepEPSs+0x16D) [0x89421d]
 mongod(_ZN5mongo31SaslSCRAMSHA1ClientConversation4stepERKNS_10StringDataEPSs+0x247) [0x894e27]
 mongod(_ZN5mongo23NativeSaslClientSession4stepERKNS_10StringDataEPSs+0x27) [0x8671b7]
 mongod(+0x48F2B6) [0x88f2b6]
 mongod(_ZN5mongo20DBClientWithCommands5_authERKNS_7BSONObjE+0x191) [0x84d3a1]
 mongod(_ZN5mongo18DBClientConnection5_authERKNS_7BSONObjE+0x16E) [0x853dee]
 mongod(_ZN5mongo20DBClientWithCommands4authERKNS_7BSONObjE+0x1D) [0x84640d]
 mongod(_ZN5mongo4repl20NetworkInterfaceImpl14ConnectionPool17acquireConnectionERKNS_11HostAndPortENS_6Date_tEN5boost9date_time18subsecond_durationINS7_10posix_time13time_durationELl1000EEE+
0x26D) [0xc3a0ad]
 mongod(_ZN5mongo4repl20NetworkInterfaceImpl11_runCommandERKNS0_19ReplicationExecutor20RemoteCommandRequestE+0xC4) [0xc3a5d4]
 mongod(_ZN5mongo4repl20NetworkInterfaceImpl23_consumeNetworkRequestsEv+0x15C) [0xc3ad8c]
 mongod(_ZN5mongo4repl20NetworkInterfaceImpl27_requestProcessorThreadBodyEPS1_RKSs+0x8D) [0xc3b39d]
 mongod(+0xBC3104) [0xfc3104]
 libpthread.so.0(+0x7DC5) [0x7f4f4f748dc5]
 libc.so.6(clone+0x6D) [0x7f4f4e1fbbdd]
-----  END BACKTRACE  -----

Further down in the traces I see a similar stack but with different error

2016-02-22T21:07:53.557+0000 I NETWORK  [initandlisten] connection accepted from 10.178.82.244:36048 #8125 (4887 connections now open)
2016-02-22T21:07:53.560+0000 I NETWORK  [initandlisten] connection accepted from 10.239.128.188:32840 #8126 (4888 connections now open)
2016-02-22T21:07:53.577+0000 I NETWORK  [initandlisten] Listener: accept() returns -1 errno:24 Too many open files
2016-02-22T21:07:53.577+0000 E NETWORK  [initandlisten] Out of file descriptors. Waiting one second before trying to accept more connections.
2016-02-22T21:07:53.587+0000 F -        [conn8118] Got signal: 6 (Aborted).
 
 0xf74fc9 0xf74642 0xf749f6 0x7faddfbd8670 0x7faddfbd85f7 0x7faddfbd9ce8 0xdacd29 0x8ed8a2 0x8ee401 0x8c35e7 0x8e1ce7 0x8e37a6 0x9ced74 0x9cfcfd 0x9d0a0b 0xba1eea 0xab38d0 0x7fb82d 0xf2639b
0x7fade11e6dc5 0x7faddfc99bdd
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"B74FC9"},{"b":"400000","o":"B74642"},{"b":"400000","o":"B749F6"},{"b":"7FADDFBA3000","o":"35670"},{"b":"7FADDFBA3000","o":"355F7"},{"b":"7FADDFBA3000","o":"3
6CE8"},{"b":"400000","o":"9ACD29"},{"b":"400000","o":"4ED8A2"},{"b":"400000","o":"4EE401"},{"b":"400000","o":"4C35E7"},{"b":"400000","o":"4E1CE7"},{"b":"400000","o":"4E37A6"},{"b":"400000","
o":"5CED74"},{"b":"400000","o":"5CFCFD"},{"b":"400000","o":"5D0A0B"},{"b":"400000","o":"7A1EEA"},{"b":"400000","o":"6B38D0"},{"b":"400000","o":"3FB82D"},{"b":"400000","o":"B2639B"},{"b":"7FA
DE11DF000","o":"7DC5"},{"b":"7FADDFBA3000","o":"F6BDD"}],"processInfo":{ "mongodbVersion" : "3.0.5", "gitVersion" : "8bc4ae20708dbb493cb09338d9e7be6698e4a3a3", "uname" : { "sysname" : "Linux
", "release" : "3.10.37-47.135.amzn1.x86_64", "version" : "#1 SMP Fri Apr 18 03:28:26 UTC 2014", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "F2642585C8D
FF37F942ABBAF84DEA258651EC3C8" }, { "b" : "7FFFE1AFE000", "elfType" : 3, "buildId" : "A9D61CE6B4FC467291ED5B3AD5418A1F5DEA012E" }, { "b" : "7FADE11DF000", "path" : "/lib64/libpthread.so.0",
"elfType" : 3, "buildId" : "E5E575776DAD20ADE8FD0CAF17897C9D89020A87" }, { "b" : "7FADE0F72000", "path" : "/usr/lib64/libssl.so.10", "elfType" : 3, "buildId" : "B949A1349FF0E93409055F00BFD60
F758EE8FA02" }, { "b" : "7FADE0B8D000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "F444116328F54797379DAA3B0FD53CF6C829F22A" }, { "b" : "7FADE0985000", "path" : "/lib64/l
ibrt.so.1", "elfType" : 3, "buildId" : "42833B65941483A611C40EA7D32F56EA83EA6E93" }, { "b" : "7FADE0781000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "6335077ACD51527BE9F2F18
451A88E2B7350C5B6" }, { "b" : "7FADE047D000", "path" : "/usr/lib64/libstdc++.so.6", "elfType" : 3, "buildId" : "5E2AE3B2E1D3984B4DE32FE73D221D8C425516E5" }, { "b" : "7FADE017B000", "path" :
"/lib64/libm.so.6", "elfType" : 3, "buildId" : "BB312C4A65B8FD830C148612CBEACEACC8B08E4F" }, { "b" : "7FADDFF65000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "00FA2883FB47
B1327397BBF167C52F51A723D013" }, { "b" : "7FADDFBA3000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "D84E3AFDFF3E164A09C125F85B5DCABC6F545B5E" }, { "b" : "7FADE13FB000", "path"
: "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "7B7BF8FEEF1A9C627EF90CA5C9188EFD4DA2DDD2" }, { "b" : "7FADDF957000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "build
Id" : "E203354E7F907ACC8C3028FE465541B666DCFBA0" }, { "b" : "7FADDF672000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "D769C8FFAF8772FDA55031ABF2E167DF2207E378" }, { "b" : "
7FADDF46F000", "path" : "/usr/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "5C01209C5AE1B1714F19B07EB58F2A1274B69DC8" }, { "b" : "7FADDF23D000", "path" : "/lib64/libk5crypto.so.3", "el
fType" : 3, "buildId" : "6C2243D37143F7FD1E16ED1F6CE4D7F16C2D7EF1" }, { "b" : "7FADDF027000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "89C6AF118B6B4FB6A73AE1813E2C8BDD722956D
1" }, { "b" : "7FADDEE18000", "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "744272FEAAABCE629AB9E11FAA4A96AEBE8BC2B4" }, { "b" : "7FADDEC15000", "path" : "/lib64/libkeyu
tils.so.1", "elfType" : 3, "buildId" : "37A58210FA50C91E09387765408A92909468D25B" }, { "b" : "7FADDE9FB000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "47EC2C63132D25E4FE8
3F77023DA1A66457A88F1" }, { "b" : "7FADDE7DA000", "path" : "/usr/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "F5054DC94443326819FBF3065CFDF5E4726F57EE" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf74fc9]
 mongod(+0xB74642) [0xf74642]
 mongod(+0xB749F6) [0xf749f6]
 libc.so.6(+0x35670) [0x7faddfbd8670]
 libc.so.6(gsignal+0x37) [0x7faddfbd85f7]
 libc.so.6(abort+0x148) [0x7faddfbd9ce8]
 mongod(_ZN5mongo12SecureRandom6createEv+0x1B9) [0xdacd29]
 mongod(_ZN5mongo31SaslSCRAMSHA1ServerConversation10_firstStepERSt6vectorISsSaISsEEPSs+0x16F2) [0x8ed8a2]
 mongod(_ZN5mongo31SaslSCRAMSHA1ServerConversation4stepERKNS_10StringDataEPSs+0x2F1) [0x8ee401]

Looks like it is crashing with about ~5k connections

1) Here is my ulimit -a 
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 491366
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 65536
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 491366
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
 
2)cat /etc/security/limits.conf
*       soft    nofile  65536
*       hard    nofile  65536
*       soft    core unlimited
 
3) cat /etc/security/limits.d/90-nproc.conf
*          soft    nproc     65536
root       soft    nproc     unlimited

The file descriptor limit should be around 65k - Am I missing something here?



 Comments   
Comment by Kelsey Schubert [ 24/Feb/16 ]

Hi dharshanr@scalegrid.net,

Please execute the following to determine the max open files for the running mongod process.

cat /proc/PIDNUMBER/limits

where PIDNUMBER is the process ID of your mongod.

Amazon Linux may place a max process limitation of 1024 which overrides ulimit settings. Please follow the steps noted in our documentation here, and report back if the issue persists.

Also, please be aware that we recommend upgrading to 3.0.9 which contains a critical fix, SERVER-21275.

Thank you,
Thomas

Generated at Thu Feb 08 04:01:31 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.