[SERVER-33522] Possible to call TaskExecutor::signalEvent twice during shutdown Created: 27/Feb/18 Updated: 29/Oct/23 Resolved: 18/May/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Internal Code |
| Affects Version/s: | None |
| Fix Version/s: | 6.1.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Ian Boros | Assignee: | Amirsaman Memaripour |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Sprint: | Service Arch 2022-05-30 | ||||||||
| Participants: | |||||||||
| Description |
|
When a mongos shuts down, it only attempts to join with client threads when ASAN is enabled, and even then, it does so with a timeout before exiting the process. Before this happens, it calls shutdownAndJoin on the TaskExecutorPool. Therefore, client threads may still be running while the ThreadPoolTaskExecutor is in a call to join(). If join() completes (and as part of completing, signals all of the unsignaled events) just before a client thread tries to signal an event, the client thread will signal the event for a second time, and trigger an invariant(). I believe this is a bug (rather than a misuse) of TaskExecutor. One way to solve this would be to make signalEvent() a no-op when the TaskExecutor is in shutdown. This way we guarantee every event is signaled exactly once: Either it is signaled before shutdown, or it is signaled as part of shutdown, and all subsequent calls to signalEvent() don't do anything. Another way of solving this would be to change the order of shutdown, so that we join with all client threads before shutting down the TaskExecutor. Right now, we don't even attempt to join with client threads unless we're running under ASAN, and even then, we do so with a timeout, so this would be a significant change. I believe this problem is the cause of: AC: Choose one of the two proposed solutions (or a potential third?). |
| Comments |
| Comment by Githook User [ 18/May/22 ] |
|
Author: {'name': 'Amirsaman Memaripour', 'email': 'amirsaman.memaripour@mongodb.com', 'username': 'samanca'}Message: |
| Comment by Max Hirschhorn [ 08/Apr/22 ] |
|
Reopening this ticket because server crashes at shutdown are undesirable and lead to scary-looking backtraces in server logs of our end users. |
| Comment by Lauren Lewis (Inactive) [ 24/Feb/22 ] |
|
We haven’t heard back from you for at least one calendar year, so this issue is being closed. If this is still an issue for you, please provide additional information and we will reopen the ticket. |
| Comment by Ruoxin Xu [ 15/Oct/20 ] |
|
matthew.tretin The patch for |
| Comment by Matthew Tretin (Inactive) [ 12/Oct/20 ] |
|
ruoxin.xu Do you think your CR for |
| Comment by Andy Schwerin [ 28/Feb/18 ] |
|
Oh, yeah, when I built the TaskExecutor, I didn't really consider Events that would be signaled from outside of Callbacks while working out the shutdown logic. |