Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-79607

ShardRegistry shutdown should not wait indefinitely on outstanding network requests

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 7.1.0-rc0, 7.0.1, 6.0.10
    • Affects Version/s: None
    • Component/s: None
    • Labels:
    • Sharding EMEA
    • Fully Compatible
    • ALL
    • v7.0, v6.0
    • Sharding EMEA 2023-08-07, Sharding EMEA 2023-08-21
    • 120

      There is a detailed description in the last comment of the BF linked.


      When we shutdown the shard registry, we shut down and join the thread pool used by the shard registry. Shutting down the thread pool just changes the state to be joinRequired. This stops any new tasks from being scheduled, but requires the join to still wait for all ongoing tasks to complete without interrupting them. In the case shown in the BF, one of the outgoing requests never finished, thus causing the join to stall.

      Some options to fix this:

      1. Shut down the shard registry after we interrupt ongoing operations, rather than before. This way we would know that there are no ongoing operations that need waiting for when we call join. I am not sure what other implications moving this shutdown may have, though.
      2. Interrupt ongoing lookups during shard registry shutdown. This would likely imply keeping cancellation tokens so that operations can be interrupted during shutdown.
      3. Add a timeout to the network calls done by the shard registry. This solution, though, could cause shard registry refresh failures during times other than shutdown.

            allison.easton@mongodb.com Allison Easton
            allison.easton@mongodb.com Allison Easton
            0 Vote for this issue
            5 Start watching this issue