-
Type: Task
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
None
-
Fully Compatible
-
61
This fixes a couple of bugs in the core analyzer currently.
- Before the core analyzer would look at the name of the core dump to determine which binary it was generated from. Turns out this is not correct and binaries can be named something other than the binary they came from so it now uses gdb to determine what the correct binary is.
- Validates the core dumps on the failed tasks to ensure we know how to process at least one of them before generating a task. This prevents us from generating a task which ends up doing nothing because it does not know how to process the core dumps.
- Makes core dump downloading/uploading less error prone. I ran into some issues where a very small and inconsistent amount of core dumps were corrupted/not a valid gzipped file. I am assuming this issue is because of Pigz so I got rid of it and now use the standard gzip library. I got rid of the timeout in the fast_archive function because evergreen increased the default timeout in the post section to 30 minutes and I don't think we will ever get close to that limit currently. I have made downloading core dumps retry at the core level instead of retrying to download all of the cores at once so if it fails to download one core dump it doesn't ruin the whole task.
- Reduces the amount of workers when running gdb. Rarely, when analyzing the core dumps evergreen with terminate the host with system unresponsive because it failed to return a heartbeat. I am guessing this is because we are just clobbering every possible thread on the machine so hopefully lowing the amount of concurrent workers will help this.
Old description below:
Currently the task is generated if there are any core dumps found on the task. Sometimes we upload core dumps from processes that are not mongo binaries. This can lead to no analysis being done if the only core dumps there are from non-mongo processes.
We need to be smarter about when we generate the tasks and check if at least one of the core dumps is from a known binary
An example failure caused by this issue is here https://spruce.mongodb.com/task/mongodb_mongo_master_enterprise_rhel_80_64_bit_dynamic_all_feature_flags_display_replica_sets_abe6f7a64d785277fb223958957252c6f8f89027_23_09_21_11_09_00