[SERVER-38060] Don't run after test hooks in resmoke if the test fails Created: 09/Nov/18 Updated: 27/Oct/23 Resolved: 05/Feb/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Testing Infrastructure |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Robert Guo (Inactive) | Assignee: | Backlog - Server Tooling and Methods (STM) (Inactive) |
| Resolution: | Gone away | Votes: | 0 |
| Labels: | tig-resmoke | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Assigned Teams: |
Server Tooling & Methods
|
||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
We should not run the resmoke correctness hooks if a test fails and the test suite runs with --continueOnFailure. In the best case it causes confusion in the evergreen side bar because there are multiple red boxes, one for the test and at least one for the hooks In the worst case the failing test leaves the server in an inconsistent state, which can cause the hook to hang, making debugging much more difficult. |
| Comments |
| Comment by Ian Whalen (Inactive) [ 05/Feb/20 ] |
|
closing as gone away as per last comment. |
| Comment by Ian Whalen (Inactive) [ 19/Dec/19 ] |
|
We don't believe that this will make sense anymore with the completion of PM-1547 pending. We will likely close this as Won't Fix after that unless someone objects. Please let us know. |
| Comment by Judah Schvimer [ 09/Nov/18 ] |
|
I agree with max.hirschhorn and would prefer to abort transactions before running any consistency check hooks to prevent hangs.
Tests that partition or fail nodes don't tend to run data consistency checks. They generally use their own fixture and mark that they shouldn't check data consistency. |
| Comment by William Schultz (Inactive) [ 09/Nov/18 ] |
|
max.hirschhorn I see your point. Generally, I think that it is difficult to assert that every test, upon completion, leaves the database (or cluster), in some kind of "consistent" state, that won't interfere with all the consistency checks we may try to execute. For example, if a replica set test partitions nodes, fails, and forgets to heal the partitions, will all data consistency checking hooks be compatible with this? I'm not sure. For this specific case (transactions being left open at the end of tests), I agree that a separate hook for cleaning this up would be sensible. It should run before any other consistency checking hooks run, and would ideally kill any idle transactions and also report information about which transactions are being killed. I think that judah.schvimer mentioned that we may already need to build something inside the server similar to this, so that could be a starting point. |
| Comment by Max Hirschhorn [ 09/Nov/18 ] |
I'm not confident about this proposal. There shouldn't be anything a test does that can cause data to be corrupt data, and so if a test fails and also happens to corrupt data that's interesting. Also, the decision about whether to archive data files is typically based around whether a resmoke.py hook fails so we may end up with less diagnostics when debugging the test failure.
william.schultz, I would propose that you create a new hook and add it to the test suites you're interested in if there are properties you want to assert about the server's state after a test runs. |