Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-27900

Shutdown can get stuck behind any thread doing ShardRegistry::reload

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.4.2, 3.5.2
    • Component/s: Sharding
    • None
    • ALL
    • Sharding 2017-03-27, Sharding 2017-04-17
    • 0

      The ShardRegistry::reload call spawns a thread to refresh the list of shards from the config server. Because this thread runs with its own OperationContext, it ends up calling ReplicationCoordinatorImpl::waitUntilOpTimeForRead without any timeout.

      Because of this, the shutdown sequence gets stuck since replication cannot make progress and update the opTime due to the server shutting down and the reload operation cannot proceed because it is waiting on the opTime to advance.

      The reason for this is that replication is the last entry in the shutdown sequence, so it never gets to be invoked in the scenario above and because of this waitUntilOpTimeForRead becomes permanently stuck.

            Assignee:
            kaloian.manassiev@mongodb.com Kaloian Manassiev
            Reporter:
            kaloian.manassiev@mongodb.com Kaloian Manassiev
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: