[SERVER-42356] teardown(finished=True) isn't ever called for the NoOpFixture Created: 23/Jul/19 Updated: 29/Oct/23 Resolved: 06/Aug/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Testing Infrastructure |
| Affects Version/s: | None |
| Fix Version/s: | 4.2.1, 4.3.1 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Robert Guo (Inactive) | Assignee: | Max Hirschhorn |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | tig-resmoke | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||
| Backport Requested: |
v4.2
|
||||||||||||||||||||
| Sprint: | STM 2019-08-12 | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Linked BF Score: | 31 | ||||||||||||||||||||
| Story Points: | 1 | ||||||||||||||||||||
| Description |
|
The flush thread will block forever for the next event if there isn't one lined up. We should make it not wait. One way could be to use the non-blocking version of scheduler.run() |
| Comments |
| Comment by Githook User [ 13/Aug/19 ] | |||||||||||||||||||||||||||||||||
|
Author: {'name': 'Max Hirschhorn', 'username': 'visemet', 'email': 'max.hirschhorn@mongodb.com'}Message: (cherry picked from commit 39e0f9648d012e556b24f3ee1728675bd9b197bb) | |||||||||||||||||||||||||||||||||
| Comment by Githook User [ 06/Aug/19 ] | |||||||||||||||||||||||||||||||||
|
Author: {'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}Message: | |||||||||||||||||||||||||||||||||
| Comment by Max Hirschhorn [ 04/Aug/19 ] | |||||||||||||||||||||||||||||||||
After adding some tracing to BufferedHandler.__init__(), BufferedHandler.flush(), and BufferedHandler.close(), it turns we weren't calling close() on the FixtureLogger instance for the NoOpFixture. This is because Job.teardown_fixture() immediately returns without calling NoOpFixture.teardown(). I also found a couple other cases where close() wouldn't be called which I've written up in Steps to reproduceApply the following patch to add the CleanEveryN hook to a test suite that uses the NoOpFixture. Note that specifying n=1 isn't sufficient on its own and ASAN_OPTIONS=detect_leaks=1 is necessary because it causes the CleanEveryN hook to log a message. (BufferedHandler.emit() lazily schedules the first flush() event and would otherwise never be called.)
| |||||||||||||||||||||||||||||||||
| Comment by Max Hirschhorn [ 25/Jul/19 ] | |||||||||||||||||||||||||||||||||
I don't think the issue has anything to do with scheduler.run() then. I believe are two possibilities:
| |||||||||||||||||||||||||||||||||
| Comment by Robert Guo (Inactive) [ 24/Jul/19 ] | |||||||||||||||||||||||||||||||||
|
Yeah the queue is not empty. There's one thing [Event(time=1564005134.2195768, priority=0, action=<bound method BufferedHandler.flush of <BuildloggerGlobalHandler (NOTSET)>>, argument=(), kwargs={})] that keeps getting rescheduled in BufferedHandler.flush. Also close_later()'s scheduled handler.close doesn't show up in the scheduler... | |||||||||||||||||||||||||||||||||
| Comment by Max Hirschhorn [ 24/Jul/19 ] | |||||||||||||||||||||||||||||||||
|
What were the contents of self.__scheduler._queue when you were debugging this? scheduler.run() won't return if there's something in the queue after all. | |||||||||||||||||||||||||||||||||
| Comment by Robert Guo (Inactive) [ 24/Jul/19 ] | |||||||||||||||||||||||||||||||||
|
Odd.. I'm not seeing scheduler.run() return. Here's my git diff:
Here's the log output: https://gist.github.com/guoyr/cdeb0253bad5915b81409ce6d5d3ac41 | |||||||||||||||||||||||||||||||||
| Comment by Max Hirschhorn [ 24/Jul/19 ] | |||||||||||||||||||||||||||||||||
robert.guo, I'm not following this issue description. Are you saying the flush thread still runs even after the job threads all exit? That behavior on its own isn't a bug - the close_later() API for the flush thread exists because we don't wait for logs to be flushed to logkeeper before running the next test or test suite.
returns immediately. The blocking=False parameter would make it so the scheduler doesn't call the delayfunc parameter. However, the purpose of the interruptible_sleep() function is to make it so the flush thread can be woken up if new work enters the queue or if the thread should exit. |