[SERVER-75033] Capture core dumps from test failures on macOS Created: 10/Apr/17 Updated: 07/Feb/24 Resolved: 20/Dec/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 7.3.0-rc0, 7.0.5, 6.0.13, 7.0.6 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | ADAM Martin (Inactive) | Assignee: | Trevor Guidry |
| Resolution: | Fixed | Votes: | 4 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Assigned Teams: |
Server Development Platform
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||
| Backport Requested: |
v7.2, v7.0, v6.0
|
||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Linked BF Score: | 24 | ||||||||||||||||||||||||
| Description |
|
In macos fuzz test failures which fail by crashing the server, coredumps for mongod are unavailable. |
| Comments |
| Comment by Githook User [ 02/Jan/24 ] | ||
|
Author: {'name': 'Trevor Guidry', 'email': 'trevor.guidry@mongodb.com', 'username': ''}Message: GitOrigin-RevId: a7261e5844d2a0a7e1146753b569d8d5bdefa82e | ||
| Comment by Githook User [ 02/Jan/24 ] | ||
|
Author: {'name': 'Trevor Guidry', 'email': 'trevor.guidry@mongodb.com', 'username': ''}Message: GitOrigin-RevId: d1a7b5ad155ade356af61b3f5d5835ba46119fab | ||
| Comment by Githook User [ 15/Dec/23 ] | ||
|
Author: {'name': 'Trevor Guidry', 'email': 'trevor.guidry@mongodb.com', 'username': ''}Message: GitOrigin-RevId: 0a181cf0e0488fc279b9da65fe9f3b0be9b48b27 | ||
| Comment by Githook User [ 14/Dec/23 ] | ||
|
Author: {'name': 'Trevor Guidry', 'email': 'trevor.guidry@mongodb.com', 'username': ''}Message: Revert " This reverts commit d6072dc2c6a08dca78ece915ad2868dcfb5c26ab. GitOrigin-RevId: 46a18f2bf208191c2e48357043a879de3f2435b6 | ||
| Comment by Githook User [ 12/Dec/23 ] | ||
|
Author: {'name': 'Trevor Guidry', 'email': 'trevor.guidry@mongodb.com', 'username': ''}Message: Revert " This reverts commit e2d9f91f379b5fa1b3d5c13915576804d3aaca4e. GitOrigin-RevId: a9a856557c0aa139f77bb284f6c6feff3c79b6ee | ||
| Comment by Githook User [ 12/Dec/23 ] | ||
|
Author: {'name': 'Trevor Guidry', 'email': 'trevor.guidry@mongodb.com', 'username': ''}Message: Revert " This reverts commit 465072730cce363cfa440c770e5aebc550c2b44d. GitOrigin-RevId: 077ab9fe2634224dc3656592a2aa17f5c27b2273 | ||
| Comment by Trevor Guidry [ 12/Dec/23 ] | ||
|
Some versions of this are getting reverted because of some issues with the implementation on mongodb-mongo-master-nightly and older versions. This is still being worked on. | ||
| Comment by Githook User [ 30/Nov/23 ] | ||
|
Author: {'name': 'Trevor Guidry', 'email': 'trevor.guidry@mongodb.com', 'username': ''}Message: Revert " This reverts commit 7bfc730dd6d825f7c0c5a26971f7b449f89b2a01. | ||
| Comment by Githook User [ 21/Nov/23 ] | ||
|
Author: {'name': 'Trevor Guidry', 'email': 'trevor.guidry@mongodb.com', 'username': ''}Message: (cherry picked from commit d6072dc2c6a08dca78ece915ad2868dcfb5c26ab) | ||
| Comment by Githook User [ 21/Nov/23 ] | ||
|
Author: {'name': 'Trevor Guidry', 'email': 'trevor.guidry@mongodb.com', 'username': ''}Message: (cherry picked from commit d6072dc2c6a08dca78ece915ad2868dcfb5c26ab) | ||
| Comment by Githook User [ 21/Nov/23 ] | ||
|
Author: {'name': 'Trevor Guidry', 'email': 'trevor.guidry@mongodb.com', 'username': ''}Message: (cherry picked from commit d6072dc2c6a08dca78ece915ad2868dcfb5c26ab) | ||
| Comment by Githook User [ 21/Nov/23 ] | ||
|
Author: {'name': 'Trevor Guidry', 'email': 'trevor.guidry@mongodb.com', 'username': ''}Message: | ||
| Comment by Trevor Guidry [ 09/Nov/23 ] | ||
|
I did some investigation on this and it turned out getting core dumps is easier than we thought! Here is a macos patch build with coredumps.
The following entitlement was required for us to avoid macos stopping us from getting core dumps and attaching to lldb. | ||
| Comment by Daniel Moody [ 08/Nov/23 ] | ||
tommaso.tocci@mongodb.com I wasn't aware #1 was also part of the problem, I could do an investigation into why crashes are not producing cores.
I did not, this seems like complex method (I don't think there is existing test infra around this), if it is the only course of action we could pursue it however, I think its feasible. | ||
| Comment by Daniel Gomez Ferro [ 08/Nov/23 ] | ||
|
daniel.moody@mongodb.com did you get to try out this idea?
alex.neben@mongodb.com at some point the idea of assembling a cross functional team to look at this issue was floated around, with people from the SDP team, the BUILD team and perhaps others. Can we go ahead with it? | ||
| Comment by Tommaso Tocci [ 06/Nov/23 ] | ||
|
daniel.moody@mongodb.com if I understood correctly we have two orthogonal issue here:
While I understand that we are unable to solve 2, I'm wondering if we could at least solve 1. | ||
| Comment by Daniel Moody [ 03/Nov/23 ] | ||
|
tommaso.tocci@mongodb.com regarding your link, the issue is not generating a core dump, it is instead getting permission for the hang analyzer to attach to a live (hung) process and generate the core dump through the debugger. | ||
| Comment by Daniel Moody [ 03/Nov/23 ] | ||
|
The problem is that the macos will not let non-tty shells attach to process as a security measure (if you attach you can see all the contents of memory including sensitive information, and the assumption is only a live tty would be a developer debugging some process). The evergreen agent is running these process locally from its agent daemon process. If you ssh to macos host, you can attach to processes no problem as long as you have permission's too. I could not find a way to get evergreen agent to attach to process, I tried a lot of different things and devprod infrastructure team helped with permissions however they could, but it was unsuccessful. If you have any ideas we can certainly keep trying. | ||
| Comment by Tommaso Tocci [ 03/Nov/23 ] | ||
|
alex.neben@mongodb.com did we tried also somenthing like https://nasa.github.io/trick/howto_guides/How-to-dump-core-file-on-MacOS.html ? | ||
| Comment by Alex Neben [ 27/Oct/23 ] | ||
|
Sorry, I should have been more clear according to some investigation daniel.moody@mongodb.com has done this is impossible to do based on some macos permissions issues. We can leave it open but technically we have been unable to solve this problem. | ||
| Comment by Kaitlin Mahar [ 27/Oct/23 ] | ||
|
alex.neben@mongodb.com, iryna.zhuravlova@mongodb.com there has been a good amount of interest in this ticket from server engineers over the years, based on the comment history and number of watchers, and in my own experience this has been a pain point in recent BFs and has been brought up by my teammates in replication team retrospectives this year. | ||
| Comment by Pierlauro Sciarelli [ 25/Aug/23 ] | ||
Any update on that? Asking because we are still getting very often BFs from mac os variants that are nearly impossible to investigate without core dumps (e.g. BF-29687). Mac os seems to be enough slow/different that it causes race conditions that we hardly get in other variants. | ||
| Comment by Daniel Moody [ 20/Jun/23 ] | ||
|
Recently I had an idea, even though the evergreen agent gets blocked attaching to process due to macos permissions, could we somehow ssh with a real user so that there is an real interactive terminal and get the coredumps? | ||
| Comment by Will Korteland [ 15/Jun/23 ] | ||
|
Ouch, thanks for the heads up alex.neben@mongodb.com. | ||
| Comment by Alex Neben [ 13/Jun/23 ] | ||
|
FYI this might not be possible. See discussion here: https://jira.mongodb.org/browse/SERVER-68902?focusedCommentId=4862069&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-4862069 | ||
| Comment by Max Hirschhorn [ 02/Feb/23 ] | ||
|
Flagging this ticket to get another look by the SDP team because the lack of a core dump came up again in BF-27618. What Drew said above is fully accurate. All platforms should always produce core dumps when testing in Evergreen. | ||
| Comment by Andrew Morrow (Inactive) [ 01/Mar/22 ] | ||
|
iryna.zhuravlova - I don't really understand that last comment. This isn't about getting core dumps for any "production" deployments of MongoDB on macOS, but for having core dumps for crashes on macOS in CI. Clearly, having core dumps would make it easier to debug crashes, so I don't see why we wouldn't still want to do this work. CC robert.guo. | ||
| Comment by Iryna Zhuravlova [ 08/Feb/22 ] | ||
|
Nobody is running macOS in production. robert.guo will sync with people and revisit if the issue still persists | ||
| Comment by Brooke Miller [ 07/Feb/22 ] | ||
|
I'm going to bump this back to 'Needs Scheduling' since I saw the discussion around this in #buildbaron. In the future, please 'Flag for Scheduling' to make sure the team properly re-evaluates the request. | ||
| Comment by Kyle Suarez [ 07/Feb/22 ] | ||
|
Would have been nice to have a core dump for BF-24048. Now that MMAPv1 is gone, is it feasible to enable core dumps for the server, or is there still a technical limitation? | ||
| Comment by Robert Guo (Inactive) [ 10/Jun/20 ] | ||
|
Thanks for the info Mark! It does look like a dupe but for some reason not all mac hosts have coredumps enabled. E.g. this patch build (There's no "Enabling coredumps" log line) Richard's change in | ||
| Comment by Mark Benvenuto [ 09/Jun/20 ] | ||
|
I added collection on the server-side with | ||
| Comment by Jonathan Abrahams [ 20/Apr/17 ] | ||
|
Stackoverflow post for OS X coredump filter If we want core dumps on OS X, we cannot filter the contents. So if the size is too large (like for mmapv1), we'll have to keep it disabled. | ||
| Comment by Jonathan Abrahams [ 20/Apr/17 ] | ||
|
The default location for the core files on OS X is /cores/core.<PID> | ||
| Comment by Jonathan Abrahams [ 20/Apr/17 ] | ||
|
We can control the core pattern on OS X, like Linux:
The Linux pattern is specified as:
| ||
| Comment by Max Hirschhorn [ 20/Apr/17 ] | ||
|
We should figure out whether we are able to generate core dumps on OS X or whether the issue is that we simply aren't uploading them to S3 as part of etc/evergreen.yml. Once we make that determination we can figure out what the necessary BUILD and/or SERVER ticket work is to capture them. |