[SERVER-39445] Executing remote command request during collection cloning in initial sync should not use RemoteCommandRequest::kNoTimeout. Created: 08/Feb/19  Updated: 27/Oct/23  Resolved: 06/Jan/20

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Suganthi Mani Assignee: Backlog - Replication Team
Resolution: Gone away Votes: 0
Labels: former-quick-wins
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Assigned Teams:
Replication
Operating System: ALL
Participants:
Linked BF Score: 19

 Description   

Currently, during collection cloning phase, we execute lot of remote command request to sync target with RemoteCommandRequest::kNoTimeout. And, this can lead to issues where sync target can hang forever on those remote commands if there is some kind of network issue reaching the sync source. My suggestion would be setting some deadline to those remote commands like we do it for oplog fetching. When the command times out,  initial sync fails. This at least will allow the sync target to retry the initial sync with different sync source.

Below is the list of those remote commands issued with kNoTimeout.
1) List databases
2) List collections
3) Count collection
4) List indexes



 Comments   
Comment by Judah Schvimer [ 26/Sep/19 ]

matthew.russotto, will this go away with Resumable Initial Sync's cloner refactor?

Generated at Thu Feb 08 04:52:05 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.