-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Internal Code
-
None
-
Fully Compatible
-
ALL
-
Service Arch 2022-05-30
When a mongos shuts down, it only attempts to join with client threads when ASAN is enabled, and even then, it does so with a timeout before exiting the process. Before this happens, it calls shutdownAndJoin on the TaskExecutorPool. Therefore, client threads may still be running while the ThreadPoolTaskExecutor is in a call to join(). If join() completes (and as part of completing, signals all of the unsignaled events) just before a client thread tries to signal an event, the client thread will signal the event for a second time, and trigger an invariant(). I believe this is a bug (rather than a misuse) of TaskExecutor.
One way to solve this would be to make signalEvent() a no-op when the TaskExecutor is in shutdown. This way we guarantee every event is signaled exactly once: Either it is signaled before shutdown, or it is signaled as part of shutdown, and all subsequent calls to signalEvent() don't do anything.
Another way of solving this would be to change the order of shutdown, so that we join with all client threads before shutting down the TaskExecutor. Right now, we don't even attempt to join with client threads unless we're running under ASAN, and even then, we do so with a timeout, so this would be a significant change.
I believe this problem is the cause of:
SERVER-25497
AC: Choose one of the two proposed solutions (or a potential third?).
- is related to
-
SERVER-25497 Fix sharded query path to handle shutdown of the mongos process
- Closed