[SERVER-44483] Debug symbols archive should only be fetched when known to be needed Created: 07/Nov/19  Updated: 06/Dec/22  Resolved: 28/Jan/22

Status: Closed
Project: Core Server
Component/s: Build
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Andrew Morrow (Inactive) Assignee: Backlog - Server Tooling and Methods (STM) (Inactive)
Resolution: Duplicate Votes: 0
Labels: meta-build-system
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-54086 Download debug symbols in resmoke han... Closed
Assigned Teams:
Server Tooling & Methods
Operating System: ALL
Participants:

 Description   

In SERVER-44017, the "do setup" task was changed to unconditionally download the debug symbols at the setup for ever task, in furtherance of improvements to the hang analyzer.

This has the unfortunate effect of adding bandwidth use and test startup latency to every task, because the debug symbols are enormous.

Additionally, it means that we can't defer aggregation, archiving, and uploading of the debug symbols into a separate "post compile" phase, which was one of the long term aims of the hygienic builds project. Doing so would cut several minutes off the runtime of the "compile" phase, allowing all tests to start sooner, and run concurrently with the work to produce and upload the debugsymbols archive.

As most tests don't actually hang, we should revise this such that the debug symbols are only downloaded on demand. This would have been straightforward, by adding them to the timeout steps, or directly to the run hang analyzer task.

Unfortunately, the introduction of launching the hang analyzer from assert.soon in SERVER-26867 has prevented this strategy from being viable.

In order for us to be able to defer the production of the debug symbols and shorten the latency between compile starting and tests running, we need to find a way to make the hang analyzer itself download the debug symbols only when needed, rather than obtaining them unconditionally as part of setup.

Note that moving the production of debug symbols out of the 'compile' task, but allowing testing to proceed in parallel with the production of the debug symbols, means that a task that hangs very quickly, before the symbols are uploaded, won't be able to obtain them immediately. The script that the hang analyzer uses to obtain the symbols should incorporate a wait and backoff strategy so that it can obtain the symbols once they are ready.

CC robert.guo and ryan.timmons



 Comments   
Comment by Ryan Timmons [ 11/Nov/19 ]

We can modify hang_analzer.py to accept a debug-symbols URL passed in by a command-line flag, but it is not straightforward to pass this through from resmoke. I'm not sure what the right solution is here, but nothing seems obvious or nice.

I would suggest that we avoid any "intelligence" in computing the download URL and instead just pass it in by a parameter or something. I don't think adding any significant complexity to evergreen.yml is a viable option (e.g. no additional tasks or levels of indirection), so it will take some thinking on how best to implement this.

If something hangs right away, the symbols may not be available yet. I would push back on doing any delay/retry with the download since the script is already rather complicated; the user could retry the task after the symbols are generated.

Generated at Thu Feb 08 05:06:06 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.