[SERVER-36063] MongoDB crashed with Signal 6 (Aborted) Created: 11/Jul/18 Updated: 27/Dec/18 Resolved: 17/Nov/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 3.4.13 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Kamil Kulak | Assignee: | Kelsey Schubert |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Operating System: | ALL |
| Participants: |
| Description |
|
Hi, our MongoDB clusters crashed recently on two separate environments with the same error message. Please find details below:
We can see similar backtrace on other nodes. Timeline:
Environment: MongoDB cluster (3 nodes replica set) is running on AWS infrastructure.
|
| Comments |
| Comment by Kelsey Schubert [ 17/Nov/18 ] |
|
Hi tesladev, Sorry this slipped through the cracks. Looking at the syslogs, it appears that there is a poor interaction between the antivirus program you are running scanning mongod files as they are being accessed. I would recommend setting up an exception for any files in the dbpath. Kind regards, |
| Comment by Kamil Kulak [ 12/Jul/18 ] |
|
Hi Bruce, I've uploaded syslogs and mongod logs covering the time of failure in one of our nodes (10.120.28.168). Could you explain me what kind of information is stored in diagnostic.data? We're a bit concerned about sensitive data that might be stored under the hood in that file Thanks, Kamil |
| Comment by Bruce Lucas (Inactive) [ 11/Jul/18 ] |
|
Hi Kamil, "Timer expired" (errno 62, ETIME) is an unusual error code, and in fact I can't find any reports in JIRA of mongod ever failing with that error code. That and the fact that in the log snippet you've psted two separate threads failed with this error code in completely unrelated places makes me suspect a system issue. Can you please upload the complete mongod log files, archived contents of $dbpath/diagnostic.data, and syslog (/var/log/messages*) covering the time of the failures? You can upload this information to this secure private portal. Thanks, |