Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-61950

ReshardingOplogFetcher waits on network request completing without interruption, potentially preventing shard step-up from ever completing

    • Fully Compatible
    • ALL
    • v5.2, v5.1, v5.0
    • Sharding 2021-12-13
    • 50
    • 3

      The ReshardingOplogFetcher uses ShardRemote::runAggregation() to run resharding's oplog fetching pipeline. ShardRemote::runAggregation() uses the Fetcher class to schedule the remote network requests. Fetcher::join() doesn't wait using an OperationContext so it continues to block even after the node steps down. For as long as the network request continues to run on the remote node, the the still-active Instance will prevent PrimaryOnlyService::onStepUp() and the overall step-up procedure from completing.

      We should instead have Fetcher::join() wait using an OperationContext so the Fetcher::~Fetcher() destructor can abandon waiting for the remote network request.

            max.hirschhorn@mongodb.com Max Hirschhorn
            max.hirschhorn@mongodb.com Max Hirschhorn
            0 Vote for this issue
            2 Start watching this issue