[SERVER-50612] ScopedTaskExecutor violates the TaskExecutor contract Created: 28/Aug/20 Updated: 14/Jul/23 |
|
| Status: | Open |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Spencer Brody (Inactive) | Assignee: | Backlog - Service Architecture |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | servicearch-wfbf-day | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Assigned Teams: |
Service Arch
|
||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||
| Linked BF Score: | 0 | ||||||||||||||||||||||||||||
| Story Points: | 5 | ||||||||||||||||||||||||||||
| Description |
|
Implementations of TaskExecutor::scheduleWork are required not to move-from their provided 'work' CallbackFn when returning a non-ok status. This requirement is relied on in TaskExecutor::schedule to call the callback with the non-ok status returned from scheduleWork. ScopedTaskExecutor is violating this contract and moving-from the provided work callback even when returning a non-ok Status from scheduleWork. We have evidence of this happening from build failures. While not confirmed, the likely culprit is this line. We need to fix this or else risk triggering this invariant in production runs. |
| Comments |
| Comment by Lauren Lewis (Inactive) [ 21/Dec/21 ] |
|
We haven’t heard back from you in at least 1 year, so I'm going to close this ticket. If this is still an issue for you, please provide additional information and we will reopen the ticket. |
| Comment by Spencer Brody (Inactive) [ 21/Sep/20 ] |
|
So my previous comment about how to work around this isn't quite right. Just ensuring that the ScopedTaskExecutor is shut down before the parent isn't enough. A concurrent race of calling schedule() against the ScopedTaskExecutor alongside shutting down the parent executor can trigger this, even if the ScopedTaskExecutor is always shut down before the parent executor. You can get an interleaving where the call to schedule() against the STE runs before the STE is shut down, but the call to schedule() against the underlying parent executor doesn't happen until after the parent executor is shut down. So to be resilient to this you have to ensure that no calls to schedule() against any outstanding STEs will happen anymore before you shut down the parent executor (something that may be harder to guarantee in practice). |
| Comment by Spencer Brody (Inactive) [ 28/Aug/20 ] |
|
Note to other readers: the way the bug in ScopedTaskExecutor is triggered is if the parent executor that ScopedTaskExecutor was created against is shut down before the scoped executor is. You can avoid this bug by ensuring that all ScopedTaskExecutors get shutdown() before their parent executor. |
| Comment by Bruce Lucas (Inactive) [ 28/Aug/20 ] |
|
OK, thanks. |
| Comment by Spencer Brody (Inactive) [ 28/Aug/20 ] |
|
The bug in ScopedTaskExecutor does technically exist in 4.4, but ScopedTaskExecutor is way less used in 4.4 and to my knowledge there are no uses that can trigger this bug. |
| Comment by Bruce Lucas (Inactive) [ 28/Aug/20 ] |
|
spencer, is this issue present in 4.4 or only master? If the former it would be good if possible to leave a comment on this ticket mentioning the string that the invariant would produce, for searchability in case it does happen. |