[SERVER-54333] Consider increasing MigrationDestinationManager::startCommit timeout Created: 05/Feb/21  Updated: 29/Oct/23  Resolved: 18/Mar/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 4.9.0

Type: Task Priority: Major - P3
Reporter: Pierlauro Sciarelli Assignee: Pierlauro Sciarelli
Resolution: Fixed Votes: 0
Labels: Sharding-EMEA, sharding-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Backwards Compatibility: Fully Compatible
Participants:
Linked BF Score: 9

 Description   

If there is any network error when the moveChunk receiver communicates with the config server, the operation fails after hanging for 30 seconds (startCommit timeout == timeout before retrying a failed network request).

Detailed explanation

In the moveChunk flow - on the receiver side - the migrateThread is calling MigrationDestinationManager::_migrateDriver in order to perform the necessary steps. After that, it notifies the _isActiveCV condition variable on which startCommit waits for a maximum of 30 seconds.

After each MigrationDestinationManager::_migrateDriver's step, the state is logged on the CSRS through the MoveTimingHelper that calls into the ShardingLogger to insert a config document. As highlighted in SERVER-51397, if a network partition happens during a CatalogClient request, the first retry happens after 30 seconds (too late because the startCommit timeout is exactly 30 seconds).



 Comments   
Comment by Githook User [ 18/Mar/21 ]

Author:

{'name': 'Pierlauro Sciarelli', 'email': 'pierlauro.sciarelli@mongodb.com', 'username': 'pierlauro'}

Message: SERVER-54333 Consider increasing MigrationDestinationManager::startCommit timeout
Branch: master
https://github.com/mongodb/mongo/commit/138acafd27f145622506ef422a48c056fb4883df

Generated at Thu Feb 08 05:33:15 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.