[SERVER-67265] Errors reported from the `buildscripts_test` task are difficult to diagnose Created: 14/Jun/22  Updated: 23/Oct/23

Status: Backlog
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Kaloian Manassiev Assignee: [DO NOT ASSIGN] Backlog - DevProd Correctness
Resolution: Unresolved Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Correctness
Operating System: ALL
Participants:

 Description   

I made a change where I removed a .js test, but forgot to remove it from the relevant .yml files that were referencing it.

When I ran Evergreen, I got two failed tasks - buildscripts_test and version_gen. The error in buildscripts_test seems to clearly show me what the problem is. However, the one in version_gen is really hard to navigate and find and on top of that is related to that on version_gen.

In the interest of developer productivity, this should be made more obvious to figure out.



 Comments   
Comment by Robert Guo (Inactive) [ 14/Jul/22 ]

pierlauro.sciarelli@mongodb.com I think what Iryna was trying to say is that you can configure the Evergreen notifications on this page so that you get notified when the first test fails. In this case, the failing test in buildscripts_test should fail within a few minutes of version_gen given neither depend on compile. Let me know if this strategy helps

Comment by Pierlauro Sciarelli [ 14/Jul/22 ]

iryna.zhuravlova@mongodb.com that could be a way, but still sometimes it takes hours for a patch to start. And I still want to make the point that that there is no reason to make a whole patch fail in case a non-existing test file is excluded from some suite.

Comment by Iryna Zhuravlova [ 12/Jul/22 ]

Hi pierlauro.sciarelli@mongodb.com! Maybe subscribing for evergreen notifications might help in this case? This should tell you sooner if the patch build failed and get you a quick warning. Let us know if it won't.  

Comment by Pierlauro Sciarelli [ 15/Jun/22 ]

Thanks for jumping on the issue robert.guo@mongodb.com!

Sorry for making it not clear, my main observation was the following: having a non-existing test excluded from a suite should not be a severe error causing the failure of a whole patch. Right now, it is. This can affect developer productivity a lot: sometimes I send a patch and check it after 2 hours, just to discover that nothing has been executed because of this error. This effectively slows down development process.

Comment by Kaloian Manassiev [ 15/Jun/22 ]

robert.guo@mongodb.com, I meant that the output of version_gen is very hard to follow, not buildscripts_test. The latter actually gives me a meaningful error, but the former is very difficult to parse - I updated the description.

Comment by Robert Guo (Inactive) [ 14/Jun/22 ]

Hey Kal, I took another look at the failures and buildscripts_test is in fact behaving as expected. The configuration in suite yamls is being presented and caught without having to wait for compile and the test to run. Regarding the less ergonomic version_gen task failure; we're continuously integrating more of resmoke suites with Evergreen tasks to reduce the amount of dedicated CI logic (e.g. now you don't have to specify a suite option for synonymous Evergreen tasks). This means that when there's an issue with resmoke suites, task generation can't proceed. In general, it should be safe to shuttle version_gen failures to Dev Prod, similar to failures in the old *_gen tasks. jeff.zambory@mongodb.com said DAG would be happy to make the error message in version_gen nicer if you'd prefer.

pierlauro.sciarelli@mongodb.com regarding the 1h wait time; that's absolutely unacceptable. The current process I believe is to enable notifications for the first failure in Evergreen, which should in this case be either version_gen or buildscripts_test. Either should show up within a few minutes. Would this be an acceptable experience? More than happy to propose other solutions if not. The test in buildscripts_test that failed should be exactly what you proposed. You can also run many checks locally with python (python -m unittest buildscripts.tests.my_test).

Comment by Kaloian Manassiev [ 14/Jun/22 ]

Suggestion from pierlauro.sciarelli@mongodb.com (from Slack):

What about proposing to simply add a task similar to the TODO check, that checks the presence of non-existing files in suite definitions?
And generates a ticket in case a problem is detected (edited) 

Suggestion from max.hirschhorn@mongodb.com (from Slack):

I'd expect we'd want a DAG ticket to fix mongodb/mongo-task-generator so it doesn't crash when there's something unexpected about the YAML suite file

Comment by Pierlauro Sciarelli [ 14/Jun/22 ]

May I also argue that this kind of failure should not cause the crash of suites generation?

If someone deletes jstests/XYZ.js and this file happened to be in the exclude list of one suite, the generation of ALL the suites fails. This is highly impacting developer productivity:
1. Schedule a patch
2. Wait one hour for the generation tasks to run
3. The whole patch fails

I would propose to simply add an additional task to the variants that checks the presence of non-existing files in suite definitions and complains if one is detected.

Right now this case is treated as a non-recoverable error and disallows executing any test.

Comment by Robert Guo (Inactive) [ 14/Jun/22 ]

Thanks for raising this issue Kal! I'm assigning this to DAG given the error is from a task generation test. The error might seem obscure since it's in rust

Generated at Thu Feb 08 06:07:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.