[SERVER-72860] Python exceptions in create_fixture_table() cause resmoke to incorrectly mark Evergreen tasks as setup failures Created: 14/Jan/23 Updated: 29/Oct/23 Resolved: 06/Feb/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Testing Infrastructure |
| Affects Version/s: | 5.0.0, 6.0.0, 6.2.0-rc6 |
| Fix Version/s: | 6.3.0-rc0 |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Max Hirschhorn | Assignee: | Tausif Rahman (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Assigned Teams: |
Server Development Platform
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Steps To Reproduce: |
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Participants: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Linked BF Score: | 159 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
Some of the commits which were impacted by BF-27442 had a large number of setup failures (for example).
Setup failures are intentionally ignored by the Build Barons so this can lead to delays in the timeliness of identifying true failures. (Setup failures are ignored because Logkeeper instability has been generally accepted and accommodated within the testing infrastructure, see It looks like the changes to standalone.py in 3805148 as part of Note: The uncaught exception at fixture teardown also causes resmoke to leak processes upon exit. It may we worthwhile to revisit whether the calls to create_fixture_table() in job.py should have their own try/except block too. |
| Comments |
| Comment by Githook User [ 06/Feb/23 ] |
|
Author: {'name': 'Tausif Rahman', 'email': 'tausif.rahman@mongodb.com', 'username': 'trahman1318'}Message: |
| Comment by Tausif Rahman (Inactive) [ 01/Feb/23 ] |
|
The challenging part of this is where the infrastructure failure was coming from and it has to do with log flushing. That is the main fix here, but there is also a drive-by. I took a deeper look into how we do logging in resmoke & here are my findings:
Proposed Fix: Add a `finally` block during fixture teardown, so that there are never any rogue logging handlers. There is also a drive by which mimics the change from here so that this failure does not happen on sharded clusters anymore. I have https://github.com/10gen/mongo/pull/10226 ready for review to address this. |