[SERVER-36063] MongoDB crashed with Signal 6 (Aborted) Created: 11/Jul/18  Updated: 27/Dec/18  Resolved: 17/Nov/18

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 3.4.13
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Kamil Kulak Assignee: Kelsey Schubert
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

Hi,

our MongoDB clusters crashed recently on two separate environments with the same error message. Please find details below:

2018-07-11T04:07:43.128+0000 E STORAGE  [thread2] WiredTiger error (62) [1531282063:127984][2808:0x7f66d81a2700], file:WiredTiger.wt, WT_SESSION.checkpoint: /data/WiredTiger.turtle: handle-open: open: Timer expired
 
2018-07-11T04:07:43.128+0000 E STORAGE  [thread2] WiredTiger error (62) [1531282063:128202][2808:0x7f66d81a2700], checkpoint-server: checkpoint server error: Timer expired
 
2018-07-11T04:07:43.128+0000 E STORAGE  [thread2] WiredTiger error (-31804) [1531282063:128228][2808:0x7f66d81a2700], checkpoint-server: the process must exit and restart: WT_PANIC: WiredTiger library panic
 
2018-07-11T04:07:43.128+0000 I -        [thread2] Fatal Assertion 28558 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 361
 
2018-07-11T04:07:43.128+0000 I -        [thread2] 
***aborting after fassert() failure
 
2018-07-11T04:07:43.128+0000 I -        [conn6891995] Fatal Assertion 28559 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 64
 
2018-07-11T04:07:43.128+0000 I -        [conn6891995] 
***aborting after fassert() failure
 
2018-07-11T04:07:43.147+0000 I -        [WTJournalFlusher] Fatal Assertion 28559 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 64
 
2018-07-11T04:07:43.147+0000 I -        [WTJournalFlusher] 
***aborting after fassert() failure
 
2018-07-11T04:07:43.199+0000 F -        [thread2] Got signal: 6 (Aborted).
 0x5606d175b5b1 0x5606d175a7c9 0x5606d175acad 0x7f66de955690 0x7f66de5af277 0x7f66de5b0968 0x5606d09ff597 0x5606d146a316 0x5606d0a09c3e 0x5606d0a09e5a 0x5606d0a0a0bc 0x5606d20bc3b3 0x7f66de94dde5 0x7f66de677bad
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"5606D01E2000","o":"15795B1","s":"_ZN5mongo15printStackTraceERSo"},{"b":"5606D01E2000","o":"15787C9"},{"b":"5606D01E2000","o":"1578CAD"},{"b":"7F66DE946000","o":"F690"},{"b":"7F66DE579000","o":"36277","s":"gsignal"},{"b":"7F66DE579000","o":"37968","s":"abort"},{"b":"5606D01E2000","o":"81D597","s":"_ZN5mongo32fassertFailedNoTraceWithLocationEiPKcj"},{"b":"5606D01E2000","o":"1288316"},{"b":"5606D01E2000","o":"827C3E","s":"__wt_eventv"},{"b":"5606D01E2000","o":"827E5A","s":"__wt_err"},{"b":"5606D01E2000","o":"8280BC","s":"__wt_panic"},{"b":"5606D01E2000","o":"1EDA3B3"},{"b":"7F66DE946000","o":"7DE5"},{"b":"7F66DE579000","o":"FEBAD","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.4.13", "gitVersion" : "fbdef2ccc53e0fcc9afb570063633d992b2aae42", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "4.14.47-56.37.amzn1.x86_64", "version" : "#1 SMP Wed Jun 6 18:49:01 UTC 2018", "machine" : "x86_64" }, "somap" : [ { "b" : "5606D01E2000", "elfType" : 3, "buildId" : "0B8D59C7E131539CC482C89F0220B0866123E74F" }, { "b" : "7FFE87344000", "elfType" : 3, "buildId" : "2DF3D53B81C4CFBB2F14430578A041B78D5A1EE2" }, { "b" : "7F66DF8E4000", "path" : "/lib64/libssl.so.10", "elfType" : 3, "buildId" : "9C4EB34A346260F2A77746F4E5ED837619137DB7" }, { "b" : "7F66DF486000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "EC480B38432587A9B21BFBD917EF020731EBD2CF" }, { "b" : "7F66DF27E000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "F2701E2A24459D5B55DF5549D585F091E7BCF07A" }, { "b" : "7F66DF07A000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "0E5CD5BAA5EE8BF3648A5031B088F9A78C89364F" }, { "b" : "7F66DED78000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "07FB92AFEF1756F093371CE60C3AE85DD3A06325" }, { "b" : "7F66DEB62000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "A03C9A80E995ED5F43077AB754A258FA0E34C3CD" }, { "b" : "7F66DE946000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "D973C39D1900DC61D8519C653C3BC405692DE563" }, { "b" : "7F66DE579000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "AF310F56618FC1EF9158973484F60942F11CC0FB" }, { "b" : "7F66DFB55000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "8402047FD4A85B3CD1142346EA06BCD6E15A82A3" }, { "b" : "7F66DE32C000", "path" : "/usr/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "6FBBD34B86296FDF883FE5122017EC5CD3F98ED7" }, { "b" : "7F66DE044000", "path" : "/usr/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "76429E6FD408BBB675798D6458F2735383710D0B" }, { "b" : "7F66DDE41000", "path" : "/usr/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "5C01209C5AE1B1714F19B07EB58F2A1274B69DC8" }, { "b" : "7F66DDC0E000", "path" : "/usr/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "5B2A76F1EF91EDAA0494BE680CADAFE6489326E1" }, { "b" : "7F66DD9F8000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "89C6AF118B6B4FB6A73AE1813E2C8BDD722956D1" }, { "b" : "7F66DD7EA000", "path" : "/usr/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "3ACB59488C6D8DE0A1F4F1B0C290A570D9E42F3D" }, { "b" : "7F66DD5E7000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "37A58210FA50C91E09387765408A92909468D25B" }, { "b" : "7F66DD3CE000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "9E5E0BF5F22DE7555BC4B9853240817147489258" }, { "b" : "7F66DD1AD000", "path" : "/usr/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "F5054DC94443326819FBF3065CFDF5E4726F57EE" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x5606d175b5b1]
 mongod(+0x15787C9) [0x5606d175a7c9]
 mongod(+0x1578CAD) [0x5606d175acad]
 libpthread.so.0(+0xF690) [0x7f66de955690]
 libc.so.6(gsignal+0x37) [0x7f66de5af277]
 libc.so.6(abort+0x148) [0x7f66de5b0968]
 mongod(_ZN5mongo32fassertFailedNoTraceWithLocationEiPKcj+0x0) [0x5606d09ff597]
 mongod(+0x1288316) [0x5606d146a316]
 mongod(__wt_eventv+0x3D7) [0x5606d0a09c3e]
 mongod(__wt_err+0x9D) [0x5606d0a09e5a]
 mongod(__wt_panic+0x2E) [0x5606d0a0a0bc]
 mongod(+0x1EDA3B3) [0x5606d20bc3b3]
 libpthread.so.0(+0x7DE5) [0x7f66de94dde5]
 libc.so.6(clone+0x6D) [0x7f66de677bad]
-----  END BACKTRACE  -----

We can see similar backtrace on other nodes.

Timeline:

ip-10-120-28-149: 02:18:18.276 [....] Got signal: 6 (Aborted)
ip-10-120-28-115: 03:16:17.437 [....] Got signal: 6 (Aborted).
ip-10-120-28-168: 04:07:43.199 [....] Got signal: 6 (Aborted).

Environment:

MongoDB cluster (3 nodes replica set) is running on AWS infrastructure.

[ec2-user@ip-10-120-28-149 ~]$ uname -a
Linux ip-10-120-28-149 4.14.47-56.37.amzn1.x86_64 #1 SMP Wed Jun 6 18:49:01 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

[ec2-user@ip-10-120-28-149 ~]$ mongo --version
MongoDB shell version v3.4.13
git version: fbdef2ccc53e0fcc9afb570063633d992b2aae42
OpenSSL version: OpenSSL 1.0.0-fips 29 Mar 2010
allocator: tcmalloc
modules: none
build environment:
    distmod: amazon
    distarch: x86_64
    target_arch: x86_64



 Comments   
Comment by Kelsey Schubert [ 17/Nov/18 ]

Hi tesladev,

Sorry this slipped through the cracks. Looking at the syslogs, it appears that there is a poor interaction between the antivirus program you are running scanning mongod files as they are being accessed. I would recommend setting up an exception for any files in the dbpath.

Kind regards,
Kelsey

Comment by Kamil Kulak [ 12/Jul/18 ]

Hi Bruce,

I've uploaded syslogs and mongod logs covering the time of failure in one of our nodes (10.120.28.168). Could you explain me what kind of information is stored in diagnostic.data? We're a bit concerned about sensitive data that might be stored under the hood in that file

Thanks, Kamil 

Comment by Bruce Lucas (Inactive) [ 11/Jul/18 ]

Hi Kamil,

"Timer expired" (errno 62, ETIME) is an unusual error code, and in fact I can't find any reports in JIRA of mongod ever failing with that error code. That and the fact that in the log snippet you've psted two separate threads failed with this error code in completely unrelated places makes me suspect a system issue.

Can you please upload the complete mongod log files, archived contents of $dbpath/diagnostic.data, and syslog (/var/log/messages*) covering the time of the failures? You can upload this information to this secure private portal.

Thanks,
Bruce
 

Generated at Thu Feb 08 04:41:54 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.