Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-39445

Executing remote command request during collection cloning in initial sync should not use RemoteCommandRequest::kNoTimeout.

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Gone away
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Replication
    • Operating System:
      ALL
    • Linked BF Score:
      56

      Description

      Currently, during collection cloning phase, we execute lot of remote command request to sync target with RemoteCommandRequest::kNoTimeout. And, this can lead to issues where sync target can hang forever on those remote commands if there is some kind of network issue reaching the sync source. My suggestion would be setting some deadline to those remote commands like we do it for oplog fetching. When the command times out,  initial sync fails. This at least will allow the sync target to retry the initial sync with different sync source.

      Below is the list of those remote commands issued with kNoTimeout.
      1) List databases
      2) List collections
      3) Count collection
      4) List indexes

        Attachments

          Activity

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: