[SERVER-46391] Increased task runtime after hygienic is enabled Created: 25/Feb/20 Updated: 27/Oct/23 Resolved: 07/Apr/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Build |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Robert Guo (Inactive) | Assignee: | Andrew Morrow (Inactive) |
| Resolution: | Works as Designed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Operating System: | ALL |
| Sprint: | Dev Platform 2020-04-20 |
| Participants: |
| Description |
|
There seems to be a ~10sec increase in runtime for each task after hygienic is enabled. This is most pronounced on shorter tasks but is more or less consistent across all tasks. See example in attached screenshot. |
| Comments |
| Comment by Robert Guo (Inactive) [ 07/Apr/20 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
Sounds good. Thank you for the analysis! | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Andrew Morrow (Inactive) [ 06/Apr/20 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
Here is a commit from the good side of the change; total test time was 01:23:
And here is a commit from the "bad" side of the change; total test time was 01:55:
These are different by about 32 seconds, but there is a lot of noise overall, so much so that the good runs of the bad regime are about the same as the bad runs of the good regime. So it is hard to attribute to any one cause. However, if you look, you can see that the slow case spent about 10 more seconds downloading debug symbols:
That is a 27% increase in download time, and an overall difference of 13 seconds, in line with the observed change in test execution times between the good and the bad regimes. Additionally, if we look at the debug symbol files themselves, we find that the debug symbols file has grown from 3.1 GB in the first case to 4.0 GB in the second, a 29% increase:
Overall, this supports my contention that the additional 10 seconds is primarily driven by the cost to download the more complete debug information made available by hygienic builds. You can see that the post-hygienic version contains several debug files not present in the older one.
As you can see, the newer build properly includes debug data for test programs like mongobridge and mongotmock, which the old one lacked. Finally, I'd like to note that the unconditional download of the debug symbols in these tests, intended to support use of the hang analyzer in failed assert.soon calls, costs a whopping 57% and 53% of the entire test time. But the truly appalling part is that in both runs the overall time actually spent running resmoke is less than two seconds. The overhead we are paying to get this test running and clean up after it is astounding. robert.guo, unless you disagree with something in this analysis, I plan to close this as "works as designed". Please let me know if you have any concerns. | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Andrew Morrow (Inactive) [ 06/Apr/20 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
Also note that a major change to structured logging went in at the same time: https://github.com/mongodb/mongo/commit/a84c09a19720b73cedb2e8ef7c5cfeedfa1c9761. I'm running the gle_auth_basics_passthrough on a few nearby commits to try to get a few more datapoints and hopefully make clear whether this was hygienic or structured logging, or something else. | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Andrew Morrow (Inactive) [ 30/Mar/20 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
This is almost certainly due to larger debug info, since we always download the debug info, and the debug info package is now more complete (and therefore larger) than before hygienic builds. | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Mathew Robinson (Inactive) [ 28/Feb/20 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
This could potentially be caused by all the tasks now doing a $PATH lookup for every binary execution? |