Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-33522

Possible to call TaskExecutor::signalEvent twice during shutdown

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 6.1.0-rc0
    • Affects Version/s: None
    • Component/s: Internal Code
    • None
    • Fully Compatible
    • ALL
    • Service Arch 2022-05-30

      When a mongos shuts down, it only attempts to join with client threads when ASAN is enabled, and even then, it does so with a timeout before exiting the process. Before this happens, it calls shutdownAndJoin on the TaskExecutorPool. Therefore, client threads may still be running while the ThreadPoolTaskExecutor is in a call to join(). If join() completes (and as part of completing, signals all of the unsignaled events) just before a client thread tries to signal an event, the client thread will signal the event for a second time, and trigger an invariant(). I believe this is a bug (rather than a misuse) of TaskExecutor.

      One way to solve this would be to make signalEvent() a no-op when the TaskExecutor is in shutdown. This way we guarantee every event is signaled exactly once: Either it is signaled before shutdown, or it is signaled as part of shutdown, and all subsequent calls to signalEvent() don't do anything.

      Another way of solving this would be to change the order of shutdown, so that we join with all client threads before shutting down the TaskExecutor. Right now, we don't even attempt to join with client threads unless we're running under ASAN, and even then, we do so with a timeout, so this would be a significant change.

      I believe this problem is the cause of:
      SERVER-25497

      AC: Choose one of the two proposed solutions (or a potential third?).

            Assignee:
            amirsaman.memaripour@mongodb.com Amirsaman Memaripour
            Reporter:
            ian.boros@mongodb.com Ian Boros
            Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

              Created:
              Updated:
              Resolved: