ShardRegistry::reload() does blocking work on NetworkInterface thread

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Works as Designed
    • Priority: Major - P3
    • None
    • Affects Version/s: 3.4.14, 3.6.4
    • Component/s: Networking
    • Sharding
    • ALL
    • Platforms 2018-01-01, Platforms 2018-01-15
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None

      The ShardRegistry schedules itself to perform _internalReload()s periodically, using the TaskExecutor. These jobs are dispatched to NetworkInterfaceASIO, and will eventually run on its thread.

      From _internalReload(), we call reload(), which tries to getAllShards() from the ShardingCatalogClientImpl. getAllShards() makes a Fetcher instance, which launches networking work, and then waits for it to join(). However, when we run this, we are already on NetworkInterfaceASIO's thread. This breaks the contract that callbacks to NetworkInterfaceASIO may not perform blocking work. Worse, when these calls are issued through the same TaskExecutor, the thread will deadlock.

            Assignee:
            [DO NOT USE] Backlog - Sharding Team
            Reporter:
            Samantha Ritter (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: