[SERVER-75033] Capture core dumps from test failures on macOS Created: 10/Apr/17  Updated: 07/Feb/24  Resolved: 20/Dec/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.3.0-rc0, 7.0.5, 6.0.13, 7.0.6

Type: Bug Priority: Major - P3
Reporter: ADAM Martin (Inactive) Assignee: Trevor Guidry
Resolution: Fixed Votes: 4
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
is depended on by SERVER-37462 Archive core dumps on macOS hosts in ... Closed
Duplicate
duplicates SERVER-68902 Core dumps aren't uploaded when C++ u... Closed
Related
Assigned Teams:
Server Development Platform
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v7.2, v7.0, v6.0
Participants:
Linked BF Score: 24

 Description   

In macos fuzz test failures which fail by crashing the server, coredumps for mongod are unavailable.

Example:
https://evergreen.mongodb.com/task/mongodb_mongo_master_osx_1010_ssl_jstestfuzz_concurrent_sharded_WT_2e73d280774aa6924c9ef2b8a1ecd8681f7b9477_17_04_07_22_12_08



 Comments   
Comment by Githook User [ 02/Jan/24 ]

Author:

{'name': 'Trevor Guidry', 'email': 'trevor.guidry@mongodb.com', 'username': ''}

Message: SERVER-75033 Capture core dumps from test failures on macOS

GitOrigin-RevId: a7261e5844d2a0a7e1146753b569d8d5bdefa82e
(cherry picked from commit c66f90f4255dffcdda6363bac033aef094a5e084)
Branch: v6.0
https://github.com/mongodb/mongo/commit/63cce1dc16e42bfa83512afcd2d8a96ce1db1a62

Comment by Githook User [ 02/Jan/24 ]

Author:

{'name': 'Trevor Guidry', 'email': 'trevor.guidry@mongodb.com', 'username': ''}

Message: SERVER-75033 Capture core dumps from test failures on macOS

GitOrigin-RevId: d1a7b5ad155ade356af61b3f5d5835ba46119fab
(cherry picked from commit c66f90f4255dffcdda6363bac033aef094a5e084)
Branch: v7.0
https://github.com/mongodb/mongo/commit/b7b797dd4220cd33309c3412fdbf435fd27bfb97

Comment by Githook User [ 15/Dec/23 ]

Author:

{'name': 'Trevor Guidry', 'email': 'trevor.guidry@mongodb.com', 'username': ''}

Message: SERVER-75033 Capture core dumps from test failures on macOS

GitOrigin-RevId: 0a181cf0e0488fc279b9da65fe9f3b0be9b48b27
Branch: master
https://github.com/mongodb/mongo/commit/c66f90f4255dffcdda6363bac033aef094a5e084

Comment by Githook User [ 14/Dec/23 ]

Author:

{'name': 'Trevor Guidry', 'email': 'trevor.guidry@mongodb.com', 'username': ''}

Message: Revert "SERVER-75033 Capture core dumps from test failures on macOS"

This reverts commit d6072dc2c6a08dca78ece915ad2868dcfb5c26ab.

GitOrigin-RevId: 46a18f2bf208191c2e48357043a879de3f2435b6
Branch: master
https://github.com/mongodb/mongo/commit/2734f82d7d09697e1fb905caf632ff77aac5c901

Comment by Githook User [ 12/Dec/23 ]

Author:

{'name': 'Trevor Guidry', 'email': 'trevor.guidry@mongodb.com', 'username': ''}

Message: Revert "SERVER-75033 Capture core dumps from test failures on macOS"

This reverts commit e2d9f91f379b5fa1b3d5c13915576804d3aaca4e.

GitOrigin-RevId: a9a856557c0aa139f77bb284f6c6feff3c79b6ee
Branch: v7.0
https://github.com/mongodb/mongo/commit/4e2d6961e6d3d5828e84daab98977b6cce136dc8

Comment by Githook User [ 12/Dec/23 ]

Author:

{'name': 'Trevor Guidry', 'email': 'trevor.guidry@mongodb.com', 'username': ''}

Message: Revert "SERVER-75033 Capture core dumps from test failures on macOS"

This reverts commit 465072730cce363cfa440c770e5aebc550c2b44d.

GitOrigin-RevId: 077ab9fe2634224dc3656592a2aa17f5c27b2273
Branch: v6.0
https://github.com/mongodb/mongo/commit/44f2904df21ca05a2a33640b3fc789a7548286c1

Comment by Trevor Guidry [ 12/Dec/23 ]

Some versions of this are getting reverted because of some issues with the implementation on mongodb-mongo-master-nightly and older versions. This is still being worked on.

Comment by Githook User [ 30/Nov/23 ]

Author:

{'name': 'Trevor Guidry', 'email': 'trevor.guidry@mongodb.com', 'username': ''}

Message: Revert "SERVER-75033 Capture core dumps from test failures on macOS"

This reverts commit 7bfc730dd6d825f7c0c5a26971f7b449f89b2a01.
Branch: v7.2
https://github.com/mongodb/mongo/commit/fd5060e6b01c356b0cd880801c942fbd80ee4bc6

Comment by Githook User [ 21/Nov/23 ]

Author:

{'name': 'Trevor Guidry', 'email': 'trevor.guidry@mongodb.com', 'username': ''}

Message: SERVER-75033 Capture core dumps from test failures on macOS

(cherry picked from commit d6072dc2c6a08dca78ece915ad2868dcfb5c26ab)
Branch: v7.2
https://github.com/mongodb/mongo/commit/7bfc730dd6d825f7c0c5a26971f7b449f89b2a01

Comment by Githook User [ 21/Nov/23 ]

Author:

{'name': 'Trevor Guidry', 'email': 'trevor.guidry@mongodb.com', 'username': ''}

Message: SERVER-75033 Capture core dumps from test failures on macOS

(cherry picked from commit d6072dc2c6a08dca78ece915ad2868dcfb5c26ab)
Branch: v6.0
https://github.com/mongodb/mongo/commit/465072730cce363cfa440c770e5aebc550c2b44d

Comment by Githook User [ 21/Nov/23 ]

Author:

{'name': 'Trevor Guidry', 'email': 'trevor.guidry@mongodb.com', 'username': ''}

Message: SERVER-75033 Capture core dumps from test failures on macOS

(cherry picked from commit d6072dc2c6a08dca78ece915ad2868dcfb5c26ab)
Branch: v7.0
https://github.com/mongodb/mongo/commit/e2d9f91f379b5fa1b3d5c13915576804d3aaca4e

Comment by Githook User [ 21/Nov/23 ]

Author:

{'name': 'Trevor Guidry', 'email': 'trevor.guidry@mongodb.com', 'username': ''}

Message: SERVER-75033 Capture core dumps from test failures on macOS
Branch: master
https://github.com/mongodb/mongo/commit/d6072dc2c6a08dca78ece915ad2868dcfb5c26ab

Comment by Trevor Guidry [ 09/Nov/23 ]

I did some investigation on this and it turned out getting core dumps is easier than we thought! Here is a macos patch build with coredumps.

 

The following entitlement was required for us to avoid macos stopping us from getting core dumps and attaching to lldb.
com.apple.security.get-task-allow bool true

Comment by Daniel Moody [ 08/Nov/23 ]

core dump are not generated on macOs variants

tommaso.tocci@mongodb.com I wasn't aware #1 was also part of the problem, I could do an investigation into why crashes are not producing cores.

Daniel Moody did you get to try out this idea?

I did not, this seems like complex method (I don't think there is existing test infra around this), if it is the only course of action we could pursue it however, I think its feasible.

Comment by Daniel Gomez Ferro [ 08/Nov/23 ]

daniel.moody@mongodb.com did you get to try out this idea?

Recently I had an idea, even though the evergreen agent gets blocked attaching to process due to macos permissions, could we somehow ssh with a real user so that there is an real interactive terminal and get the coredumps?

alex.neben@mongodb.com at some point the idea of assembling a cross functional team to look at this issue was floated around, with people from the SDP team, the BUILD team and perhaps others. Can we go ahead with it?

Comment by Tommaso Tocci [ 06/Nov/23 ]

daniel.moody@mongodb.com if I understood correctly we have two orthogonal issue here:

  1. core dump are not generated on macOs variants
  2. hang analyzer do not work on macOs core dump

While I understand that we are unable to solve 2, I'm wondering if we could at least solve 1.

Comment by Daniel Moody [ 03/Nov/23 ]

tommaso.tocci@mongodb.com regarding your link, the issue is not generating a core dump, it is instead getting permission for the hang analyzer to attach to a live (hung) process and generate the core dump through the debugger.

Comment by Daniel Moody [ 03/Nov/23 ]

The problem is that the macos will not let non-tty shells attach to process as a security measure (if you attach you can see all the contents of memory including sensitive information, and the assumption is only a live tty would be a developer debugging some process). The evergreen agent is running these process locally from its agent daemon process. If you ssh to macos host, you can attach to processes no problem as long as you have permission's too.

I could not find a way to get evergreen agent to attach to process, I tried a lot of different things and devprod infrastructure team helped with permissions however they could, but it was unsuccessful. If you have any ideas we can certainly keep trying.

Comment by Tommaso Tocci [ 03/Nov/23 ]

alex.neben@mongodb.com did we tried also somenthing like https://nasa.github.io/trick/howto_guides/How-to-dump-core-file-on-MacOS.html ?

Comment by Alex Neben [ 27/Oct/23 ]

Sorry, I should have been more clear according to some investigation daniel.moody@mongodb.com has done this is impossible to do based on some macos permissions issues. We can leave it open but technically we have been unable to solve this problem.

Comment by Kaitlin Mahar [ 27/Oct/23 ]

alex.neben@mongodb.com, iryna.zhuravlova@mongodb.com there has been a good amount of interest in this ticket from server engineers over the years, based on the comment history and number of watchers, and in my own experience this has been a pain point in recent BFs and has been brought up by my teammates in replication team retrospectives this year.
I know DEVPROD-290 is in progress but my understanding from the scope doc is that project does not change anything with regards to macOS.
Can we reconsider this or can you provide some more context on why this has been deemed not worth fixing? Thank you.

Comment by Pierlauro Sciarelli [ 25/Aug/23 ]

Recently I had an idea, even though the evergreen agent gets blocked attaching to process due to macos permissions, could we somehow ssh with a real user so that there is an real interactive terminal and get the coredumps?

Any update on that?

Asking because we are still getting very often BFs from mac os variants that are nearly impossible to investigate without core dumps (e.g. BF-29687). Mac os seems to be enough slow/different that it causes race conditions that we hardly get in other variants.

Comment by Daniel Moody [ 20/Jun/23 ]

Recently I had an idea, even though the evergreen agent gets blocked attaching to process due to macos permissions, could we somehow ssh with a real user so that there is an real interactive terminal and get the coredumps?

Comment by Will Korteland [ 15/Jun/23 ]

Ouch, thanks for the heads up alex.neben@mongodb.com.

Comment by Alex Neben [ 13/Jun/23 ]

FYI this might not be possible. See discussion here: https://jira.mongodb.org/browse/SERVER-68902?focusedCommentId=4862069&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-4862069

Comment by Max Hirschhorn [ 02/Feb/23 ]

Flagging this ticket to get another look by the SDP team because the lack of a core dump came up again in BF-27618. What Drew said above is fully accurate. All platforms should always produce core dumps when testing in Evergreen.

Comment by Andrew Morrow (Inactive) [ 01/Mar/22 ]

iryna.zhuravlova - I don't really understand that last comment. This isn't about getting core dumps for any "production" deployments of MongoDB on macOS, but for having core dumps for crashes on macOS in CI. Clearly, having core dumps would make it easier to debug crashes, so I don't see why we wouldn't still want to do this work. CC robert.guo.

Comment by Iryna Zhuravlova [ 08/Feb/22 ]

Nobody is running macOS in production. robert.guo will sync with people and revisit if the issue still persists

Comment by Brooke Miller [ 07/Feb/22 ]

I'm going to bump this back to 'Needs Scheduling' since I saw the discussion around this in #buildbaron. In the future, please 'Flag for Scheduling' to make sure the team properly re-evaluates the request.

Comment by Kyle Suarez [ 07/Feb/22 ]

Would have been nice to have a core dump for BF-24048. Now that MMAPv1 is gone, is it feasible to enable core dumps for the server, or is there still a technical limitation?

Comment by Robert Guo (Inactive) [ 10/Jun/20 ]

Thanks for the info Mark! It does look like a dupe but for some reason not all mac hosts have coredumps enabled. E.g. this patch build (There's no "Enabling coredumps" log line)

Richard's change in SERVER-47769 only affects unittests so hopefully it won't affect anyone. I'll file a followup ticket to investigate the core pattern issue and use the -perm flag for BSD.

Comment by Mark Benvenuto [ 09/Jun/20 ]

I added collection on the server-side with SERVER-37462. Is this a duplicate of that work?

Comment by Jonathan Abrahams [ 20/Apr/17 ]

Stackoverflow post for OS X coredump filter

If we want core dumps on OS X, we cannot filter the contents. So if the size is too large (like for mmapv1), we'll have to keep it disabled.

Comment by Jonathan Abrahams [ 20/Apr/17 ]

The default location for the core files on OS X is /cores/core.<PID>

Comment by Jonathan Abrahams [ 20/Apr/17 ]

We can control the core pattern on OS X, like Linux:

sudo sysctl -w kern.corefile="dump_%N.%P.core"

The Linux pattern is specified as:

/sbin/sysctl -w kernel.core_pattern="dump_%e.%p.core"

Comment by Max Hirschhorn [ 20/Apr/17 ]

We should figure out whether we are able to generate core dumps on OS X or whether the issue is that we simply aren't uploading them to S3 as part of etc/evergreen.yml. Once we make that determination we can figure out what the necessary BUILD and/or SERVER ticket work is to capture them.

Generated at Thu Feb 08 06:29:09 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.