Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-54333

Consider increasing MigrationDestinationManager::startCommit timeout

    • Fully Compatible
    • 9

      If there is any network error when the moveChunk receiver communicates with the config server, the operation fails after hanging for 30 seconds (startCommit timeout == timeout before retrying a failed network request).

      Detailed explanation

      In the moveChunk flow - on the receiver side - the migrateThread is calling MigrationDestinationManager::_migrateDriver in order to perform the necessary steps. After that, it notifies the _isActiveCV condition variable on which startCommit waits for a maximum of 30 seconds.

      After each MigrationDestinationManager::_migrateDriver's step, the state is logged on the CSRS through the MoveTimingHelper that calls into the ShardingLogger to insert a config document. As highlighted in SERVER-51397, if a network partition happens during a CatalogClient request, the first retry happens after 30 seconds (too late because the startCommit timeout is exactly 30 seconds).

            pierlauro.sciarelli@mongodb.com Pierlauro Sciarelli
            pierlauro.sciarelli@mongodb.com Pierlauro Sciarelli
            0 Vote for this issue
            1 Start watching this issue