[SERVER-43674] mongoDB crash with "Got signal: 11 (Segmentation fault)." Created: 27/Sep/19 Updated: 11/Dec/19 Resolved: 11/Dec/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 3.6.7 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Jiri Sula | Assignee: | Sulabh Mahajan |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Mongo 3.6.7, git version: 2628472127e9f1826e02c665c1d93880a204075e |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Operating System: | ALL | ||||||||
| Sprint: | Storage Engines 2019-11-18, Storage Engines 2019-12-02, Storage Engines 2019-12-16 | ||||||||
| Participants: | |||||||||
| Story Points: | 5 | ||||||||
| Description |
|
Good day! MongoDB node crashed with "Got signal: 11 (Segmentation fault)." There are no traces to blame WireShark (although direction https://jira.mongodb.org/browse/SERVER-31236 looks very familiar). Thank you!
{{}} |
| Comments |
| Comment by Sulabh Mahajan [ 11/Dec/19 ] | |
|
I had a look at this bug report. There seems to be a possibility that a swept dhandle was somehow accessed incorrectly. There is no additional information in the ticket to debug this issue. The log file unfortunately is not useful in this case. A core from the crash might have been helpful. We have not seen this issue reported elsewhere. Because of the rare occurrence of this bug and the lack of information to debug the issue, I am inclined to close this one out for now. We will keep a watch on the internal testing and provide an update if we see this issue appear. Please feel free to re-open the ticket if this crash happens again or you get more information on the original crash. | |
| Comment by Louis Williams [ 17/Oct/19 ] | |
|
The failure is here, which is a macro that expands to:
This was part of a killCursors command. I did the pointer arithmetic, and the 0x88 offset is too low to be the "dhandle" pointer on the "session", so I believe the invalid pointer is the "handle" pointer on the "dhandle". I wonder if a dhandle sweep invalided the handle on this session and resetting the cursor failed for that reason. I'm going to send this to Storage Engines to take a look.
| |
| Comment by Louis Williams [ 17/Oct/19 ] | |
|
daniel.hatcher, the failures both seem related to our cursor caching mechanism, yet I can't say that they're the same problem. | |
| Comment by Vaclav Bilek [ 03/Oct/19 ] | |
|
Its on different physical HW, primary replicas of a sharded cluster. We dont have logs since start, the uploaded one is for the day it crashed. | |
| Comment by Danny Hatcher (Inactive) [ 30/Sep/19 ] | |
|
jiri.sula@livesport.eu, as I mentioned on | |
| Comment by Jiri Sula [ 27/Sep/19 ] | |
|
I can't believe I mentioned WireShark, all those wired animals...weekend is coming. So there is no messages to blame storage engine yet, thus I didn't fill up Component. Thanks! |