Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-27900

Shutdown can get stuck behind any thread doing ShardRegistry::reload

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Duplicate
    • Affects Version/s: 3.4.2, 3.5.2
    • Fix Version/s: None
    • Component/s: Sharding
    • Labels:
      None
    • Operating System:
      ALL
    • Sprint:
      Sharding 2017-03-27, Sharding 2017-04-17
    • Linked BF Score:
      0

      Description

      The ShardRegistry::reload call spawns a thread to refresh the list of shards from the config server. Because this thread runs with its own OperationContext, it ends up calling ReplicationCoordinatorImpl::waitUntilOpTimeForRead without any timeout.

      Because of this, the shutdown sequence gets stuck since replication cannot make progress and update the opTime due to the server shutting down and the reload operation cannot proceed because it is waiting on the opTime to advance.

      The reason for this is that replication is the last entry in the shutdown sequence, so it never gets to be invoked in the scenario above and because of this waitUntilOpTimeForRead becomes permanently stuck.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: