[SERVER-56167] Guarantee hang analyzer collects core dumps for sharded clusters, at minimum Created: 19/Apr/21  Updated: 29/Oct/23  Resolved: 07/Jun/21

Status: Closed
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: None
Fix Version/s: 5.0.6, 5.1.0-rc0

Type: Improvement Priority: Major - P3
Reporter: Robert Guo (Inactive) Assignee: Mikhail Shchatko
Resolution: Fixed Votes: 2
Labels: dp-qp-stakeholder-request-2021-04, dp-qp-stakeholder-request-2021-07, tig-hanganalyzer
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
related to SERVER-72613 Speed up taking core dumps with the h... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v5.0
Sprint: STM 2021-06-14
Participants:
Linked BF Score: 164
Story Points: 2

 Description   

Attaching gdb and collecting diagnostics for all processes in a sharded cluster continues to time out after 15 minutes. BF-20581 is a recent example where only 6 of the 9 mongod processes were attached to. Server engineers may end up relying on good luck or having access to multiple occurrences to successfully interpret the cause of a hang.

We should consider 1. reordering the steps in hang analyzer so a core dump can be captured for every mongod process even if the diagnostics against the live process cannot, or 2. we should consider sending a SIGABRT to any process gcore wasn't run on before the 15 minutes expire.



 Comments   
Comment by Githook User [ 13/Dec/21 ]

Author:

{'name': 'Mikhail Shchatko', 'email': 'mikhail.shchatko@mongodb.com', 'username': 'MikhailShchatko'}

Message: SERVER-56167 Guarantee hang analyzer collects core dumps for sharded clusters

(cherry picked from commit 940116b555f9b5e624b13d4465ebfa83a00ee049)
Branch: v5.0
https://github.com/mongodb/mongo/commit/4b2608fc881f8443c0945a060129b92282674f3d

Comment by Vivian Ge (Inactive) [ 06/Oct/21 ]

Updating the fixversion since branching activities occurred yesterday. This ticket will be in rc0 when it’s been triggered. For more active release information, please keep an eye on #server-release. Thank you!

Comment by Githook User [ 07/Jun/21 ]

Author:

{'name': 'Mikhail Shchatko', 'email': 'mikhail.shchatko@mongodb.com', 'username': 'MikhailShchatko'}

Message: SERVER-56167 Guarantee hang analyzer collects core dumps for sharded clusters
Branch: master
https://github.com/mongodb/mongo/commit/940116b555f9b5e624b13d4465ebfa83a00ee049

Generated at Thu Feb 08 05:38:32 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.