[SERVER-19895] resmoke failures should self-document Created: 12/Aug/15 Updated: 10/May/22 Resolved: 22/Nov/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Testing Infrastructure |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Eric Milkie | Assignee: | Raiden Worley (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | tig-qwin-eligible, tig-resmoke | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Backwards Compatibility: | Fully Compatible |
| Sprint: | TIG E (01/08/16), STM 2019-11-14, STM 2019-11-28 |
| Participants: | |
| Story Points: | 2 |
| Description |
|
When resmoke fails, it should print out steps to help the user debug the failure. E.g. when resmoke detects that it's run in Evergreen, it should print out the places that the user should look for symptoms. Original description: |
| Comments |
| Comment by Githook User [ 22/Nov/19 ] | |||||||||
|
Author: {'name': 'Carl Worley', 'email': 'carl.worley@mongodb.com'}Message: | |||||||||
| Comment by Raiden Worley (Inactive) [ 22/Nov/19 ] | |||||||||
| Comment by Robert Guo (Inactive) [ 12/Sep/19 ] | |||||||||
|
Excellent! I updated the ticket to say that resmoke's error handling and reporting mechanism should gather and display debugging instructions in addition to the sometimes obscure error messages. | |||||||||
| Comment by Eric Milkie [ 12/Sep/19 ] | |||||||||
Yes, exactly! Perhaps all we need to do is document this better – this is all I need to keep in mind to help me diagnose these in the future. | |||||||||
| Comment by Robert Guo (Inactive) [ 12/Sep/19 ] | |||||||||
The original request in this ticket was to make it easier to detect if a process exited abnormally. The solution was to add an item (jobx_fixture_teardown) to the sidebar that will turn red if a fixture crashed. This extra red box is meant to replace searching for "not 'exited with code 0'", which I believe it does now with a much more explicit error message: "An error occurred during the teardown..." Maybe the confusion is that when there are multiple red boxes, it isn't clear that one should look for teardown/setup failures first, then hook failures, then test failures? The teardown/setup failure box should replace the need to look for "error" in the task logs. Happy to chat more if the above comment doesn't help. | |||||||||
| Comment by Eric Milkie [ 11/Sep/19 ] | |||||||||
|
But that isn't true, actually. After clicking on the links for the red boxes, I must then try to figure out what went wrong. The way that Evergreen divides up logs into tests is subtle, so I end up looking at a combination of all the "failed" test logs, and the task log, to try to figure out what the actual problem is. This involves trying to parse the task log, and then I end up at the point where my suggestion starts. As an example, take the failure that Vlad linked above. There are three "tests" that are flagged as failed, but only one of them truly shows you logs that identify the problem. | |||||||||
| Comment by Robert Guo (Inactive) [ 10/Sep/19 ] | |||||||||
|
Thanks for the suggestion Eric. The roundabout but utilitarian answer to both of your questions is that you don't have to worry about any of it now. All failures/errors/whatevers are expressed as red boxes in the Evergreen sidebar. The thing to remember now is just to click on the link(s) for the red box(es). There are some historic reasons for the error/failure mishmash and printing the report summary 3 times. I will convert this ticket to address it. | |||||||||
| Comment by Eric Milkie [ 10/Sep/19 ] | |||||||||
|
Cool, thanks. I think we should convert this ticket to a little cleanup in the task output for failures such as this one.
Can we make it so that no one has to remember what the difference between an "errored" test and a "failed" test is? (Is the difference that "failed" tests returned a bad status from running a program? If that's true, how can "1 DB Exception" be an exit code from such a test?) Even more confusingly, the output for job0_fixture_teardown in the task log literally says
So did it fail or did it error? Ideally, we'd remove this distinction to reduce confusion. | |||||||||
| Comment by Vlad Rachev (Inactive) [ 10/Sep/19 ] | |||||||||
|
In the test logs:
| |||||||||
| Comment by Eric Milkie [ 10/Sep/19 ] | |||||||||
|
We waited too long since Max posted his example back in January, so the log data has been purged from logkeeper. Do you have a more recent example of this? | |||||||||
| Comment by Vlad Rachev (Inactive) [ 09/Sep/19 ] | |||||||||
|
milkie A detailed log message is written to the "test logs". That seems sufficient enough to me - would you also like that same log message to be written to the "task logs", as max.hirschhorn suggested? If not I will close the ticket. | |||||||||
| Comment by Max Hirschhorn [ 23/Jan/19 ] | |||||||||
|
I believe the goal of this ticket should be to make the
log message that written to the "task logs" include the same level of detail as the
log message that's written to the "test logs". | |||||||||
| Comment by Michael O'Brien [ 05/Jan/16 ] | |||||||||
|
Writing causes of failure to stderr might also help when being run in evergreen, since log lines from stderr show up red so they are easily noticed. | |||||||||
| Comment by Michael O'Brien [ 05/Jan/16 ] | |||||||||
|
Would also like to make job teardown messages stand out more when they fail, for example instead of:
something like:
|