[SERVER-73171] Evergreen task did not upload generated core dump files to S3 due to exhausting allotted time of 15 minutes Created: 21/Jan/23  Updated: 14/Aug/23  Resolved: 14/Aug/23

Status: Closed
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor - P4
Reporter: Max Hirschhorn Assignee: [DO NOT ASSIGN] Backlog - Server Development Platform Team (SDP) (Inactive)
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-79266 task-timed-out: sharding_multiversion... Open
Duplicate
duplicates SERVER-79972 Investigate making core dump archival... Closed
Related
related to SERVER-75070 increase hang analyzer self test time... Closed
Assigned Teams:
Server Development Platform
Operating System: ALL
Participants:
Linked BF Score: 130

 Description   

https://spruce.mongodb.com/task/mongodb_mongo_master_enterprise_rhel_80_64_bit_dynamic_debug_mode_config_fuzzer_concurrency_sharded_replication_0_linux_enterprise_patch_effd5996bf966b5784223e9ab08f8401c33f9d2f_63c7d5012a60ed2c8e686a76_23_01_18_11_16_46/tests?execution=0&sortBy=STATUS&sortDir=ASC is an Evergreen task which had eight (8) core dump files generated by different mongod processes.

Two (2) of the core dump files were generated due to the server process crashing. The remaining six (6) core dump files were generated by the hang analyzer being triggered by an assert.soon() due to no primary being detected for the replica set shard.

https://parsley.mongodb.com/resmoke/9b541bb6fe0ea2f05882cc5af697a11f/test/173b71b73d71610bfa1c09281283e633?bookmarks=0,12559,12560,12598,12789&selectedLine=12559
https://parsley.mongodb.com/resmoke/08fb09f2bb63325bdff85925c8f73c01/all?bookmarks=0,324944,324963,419629,466465,466466,466467,466468,466469,466470,655175

[2023/01/18 12:01:52.437] Enabling coredumps
...
[2023/01/18 16:19:19.167] Running command 'archive.targz_pack' in "save mongo coredumps" (step 13.3 of 22).
[2023/01/18 16:34:15.490] Command ''archive.targz_pack' in "save mongo coredumps"' stopped early because idle timeout duration of 7200 seconds has been reached.

https://parsley.mongodb.com/evergreen/mongodb_mongo_master_enterprise_rhel_80_64_bit_dynamic_debug_mode_config_fuzzer_concurrency_sharded_replication_0_linux_enterprise_patch_effd5996bf966b5784223e9ab08f8401c33f9d2f_63c7d5012a60ed2c8e686a76_23_01_18_11_16_46/0/task?bookmarks=0,1153,2372,2373,2376&selectedLine=2373



 Comments   
Comment by Alex Neben [ 14/Aug/23 ]

Duplicate linked

Comment by Max Hirschhorn [ 22/Jan/23 ]

Knowing the file sizes of the core dumps would probably be helpful but I wouldn't expect that information to be available in the observed case where the Evergreen task timed out compressing the core dump files.

Comment by Max Hirschhorn [ 22/Jan/23 ]

Going by the 9 samples from an earlier patch build where I intentionally triggered the hang analyzer to capture seven (7) core dumps, compressing the core dump files appears to take between 3-4 minutes normally.

[2023/01/07 05:54:40.546] Finished command 'archive.targz_pack' in "save mongo coredumps" in 2m58.083547037s.
[2023/01/07 05:29:03.260] Finished command 'archive.targz_pack' in "save mongo coredumps" in 3m52.80044446s.
[2023/01/07 05:33:23.973] Finished command 'archive.targz_pack' in "save mongo coredumps" in 4m25.990363651s.
[2023/01/07 01:51:23.581] Finished command 'archive.targz_pack' in "save mongo coredumps" in 4m23.024741873s.
[2023/01/07 01:47:42.479] Finished command 'archive.targz_pack' in "save mongo coredumps" in 3m41.47114484s.
[2023/01/07 01:47:33.969] Finished command 'archive.targz_pack' in "save mongo coredumps" in 4m14.393953749s.
[2023/01/07 01:05:18.457] Finished command 'archive.targz_pack' in "save mongo coredumps" in 3m2.115077008s.
[2023/01/07 01:04:47.752] Finished command 'archive.targz_pack' in "save mongo coredumps" in 3m20.782676291s.
[2023/01/07 02:22:17.225] Finished command 'archive.targz_pack' in "save mongo coredumps" in 2m54.578276384s.

Comment by Daniel Moody [ 22/Jan/23 ]

How long does the compression of the core dumps usually take? is the compression of multiple coredumps able to be done in parallel?

Comment by Max Hirschhorn [ 21/Jan/23 ]

I don't claim to have seen the "save mongo coredumps" function timing out issue before. But I wanted to record it in Jira in case we see any patterns emerge in the context of DAG-2268 / DAG-2269.

Generated at Thu Feb 08 06:23:52 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.