[SERVER-30072] Fatal error in replicaset index builder Created: 10/Jul/17  Updated: 20/Sep/17  Resolved: 28/Aug/17

Status: Closed
Project: Core Server
Component/s: Usability
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Chad Kreimendahl Assignee: Kelsey Schubert
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Steps To Reproduce:

While running several copyDatabase functions concurrently, the error occured. Only currently verified to impact 3.4.6 on linux

Participants:

 Description   

2017-07-07T15:09:03.839-0500 F -        [repl index builder 23862] Got signal: 6 (Aborted).
 
 0x5633b8a7f8a1 0x5633b8a7eab9 0x5633b8a7ef9d 0x7f1544cec890 0x7f1544967067 0x7f1544968448 0x5633b7d2ecf3 0x5633b8191a96 0x5633b89f2101 0x5633b94f3d30 0x7f1544ce5064 0x7f1544a1a62d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"5633B7515000","o":"156A8A1","s":"_ZN5mongo15printStackTraceERSo"},{"b":"5633B7515000","o":"1569AB9"},{"b":"5633B7515000","o":"1569F9D"},{"b":"7F1544CDD000","o":"F890"},{"b":"7F1544932000","o":"35067","s":"gsignal"},{"b":"7F1544932000","o":"36448","s":"abort"},{"b":"5633B7515000","o":"819CF3","s":"_ZN5mongo32fassertFailedNoTraceWithLocationEiPKcj"},{"b":"5633B7515000","o":"C7CA96","s":"_ZN5mongo12IndexBuilder3runEv"},{"b":"5633B7515000","o":"14DD101","s":"_ZN5mongo13BackgroundJob7jobBodyEv"},{"b":"5633B7515000","o":"1FDED30"},{"b":"7F1544CDD000","o":"8064"},{"b":"7F1544932000","o":"E862D","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.4.6", "gitVersion" : "c55eb86ef46ee7aede3b1e2a5d184a7df4bfb5b5", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.16.0-4-amd64", "version" : "#1 SMP Debian 3.16.43-2+deb8u2 (2017-06-26)", "machine" : "x86_64" }, "somap" : [ { "b" : "5633B7515000", "elfType" : 3, "buildId" : "A103B8CEADAFC57DD867918614DCE184B9D877C2" }, { "b" : "7FFCD8F3E000", "path" : "linux-vdso.so.1", "elfType" : 3, "buildId" : "FC9137B45D34B77AE9F781A05AA9CF5C3CD44D62" }, { "b" : "7F1545C19000", "path" : "/usr/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "21115992A1F885E1ACE88AADA60F126AD9759D03" }, { "b" : "7F154581D000", "path" : "/usr/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "32E9A5B9EED626E93DEEB00A49033F78652DB9A3" }, { "b" : "7F1545615000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "A63C95FB33CCA970E141D2E13774B997C1CF0565" }, { "b" : "7F1545411000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "D70B531D672A34D71DB42EB32B68E63F2DCC5B6A" }, { "b" : "7F1545110000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "152C93BA3E8590F7ED0BCDDF868600D55EC4DD6F" }, { "b" : "7F1544EFA000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "D5FB04F64B3DAEA6D6B68B5E8B9D4D2BC1A6E1FC" }, { "b" : "7F1544CDD000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "9DA9387A60FFC196AEDB9526275552AFEF499C44" }, { "b" : "7F1544932000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "48C48BC6ABB794461B8A558DD76B29876A0551F0" }, { "b" : "7F1545E7A000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "1D98D41FBB1EABA7EC05D0FD7624B85D6F51C03C" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x5633b8a7f8a1]
 mongod(+0x1569AB9) [0x5633b8a7eab9]
 mongod(+0x1569F9D) [0x5633b8a7ef9d]
 libpthread.so.0(+0xF890) [0x7f1544cec890]
 libc.so.6(gsignal+0x37) [0x7f1544967067]
 libc.so.6(abort+0x148) [0x7f1544968448]
 mongod(_ZN5mongo32fassertFailedNoTraceWithLocationEiPKcj+0x0) [0x5633b7d2ecf3]
 mongod(_ZN5mongo12IndexBuilder3runEv+0xD86) [0x5633b8191a96]
 mongod(_ZN5mongo13BackgroundJob7jobBodyEv+0x131) [0x5633b89f2101]
 mongod(+0x1FDED30) [0x5633b94f3d30]
 libpthread.so.0(+0x8064) [0x7f1544ce5064]
 libc.so.6(clone+0x6D) [0x7f1544a1a62d]
-----  END BACKTRACE  -----



 Comments   
Comment by Kelsey Schubert [ 19/Jul/17 ]

Hi sallgeud,

Sorry for the delay getting back to you. From the log files, I can see that these crashes are the result of hitting a "Too many open files" system limit. We're aware that these replica sets may have hundreds of thousands of active data handles, as a result, unfortunately, this type of error is not unexpected. As you're aware, we have work scheduled to reduce the number of open files required for your schema and workload. For now, I would recommend reconfirming that your ulimits are appropriately set.

Kind regards,
Thomas

Comment by Chad Kreimendahl [ 19/Jul/17 ]

Uploaded. It happened in 3.4.5 for us in the previous few days, so I've uploaded the 3.4.5 logs as well.

Comment by Chad Kreimendahl [ 10/Jul/17 ]

Oops.. nm. See it now

Comment by Chad Kreimendahl [ 10/Jul/17 ]

Sure. Send me over the secure upload Link

Comment by Kelsey Schubert [ 10/Jul/17 ]

Additionally, would you please provide the diagnostic.data, as that may help us rule out some other theories that would explain this behavior.

Thanks again,
Thomas

Comment by Kelsey Schubert [ 10/Jul/17 ]

Hi sallgeud,

Thanks for the report, to help us investigate would you please upload the complete log file that includes the fassert? I've created a secure upload portal for you to use.

Kind regards,
Thomas

Generated at Thu Feb 08 04:22:35 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.