Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-61950

ReshardingOplogFetcher waits on network request completing without interruption, potentially preventing shard step-up from ever completing

    XMLWordPrintable

Details

    • Fully Compatible
    • ALL
    • v5.2, v5.1, v5.0
    • Sharding 2021-12-13
    • 50
    • 3

    Description

      The ReshardingOplogFetcher uses ShardRemote::runAggregation() to run resharding's oplog fetching pipeline. ShardRemote::runAggregation() uses the Fetcher class to schedule the remote network requests. Fetcher::join() doesn't wait using an OperationContext so it continues to block even after the node steps down. For as long as the network request continues to run on the remote node, the the still-active Instance will prevent PrimaryOnlyService::onStepUp() and the overall step-up procedure from completing.

      We should instead have Fetcher::join() wait using an OperationContext so the Fetcher::~Fetcher() destructor can abandon waiting for the remote network request.

      Attachments

        Issue Links

          Activity

            People

              max.hirschhorn@mongodb.com Max Hirschhorn
              max.hirschhorn@mongodb.com Max Hirschhorn
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: