Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-30728

Low Azure socket timeout may cause initial sync failure

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Admin
    • Labels:
      None
    • Operating System:
      ALL

      Description

      Hi Team,

      We are running a MongoDB instance on Azure VM with default settings. We notice that Azure VM tends to close socket connection if it's not active in several minutes. When we are trying to initial sync from MongoDB on Azure VM to another replica set member, syncing always fails because the connection will be dropped when there's no network traffic for several minutes (e.g., when the startup instance is building an index), and initial sync will start all over.

      A sample log snippet:

      [building index here...]
      2017-08-17T19:06:30.632+0800 I NETWORK [rsSync] Socket recv() errno:10053 An established connection was aborted by the software in your host machine. [***.***.***.***:*****]
      2017-08-17T19:06:30.632+0800 I NETWORK [rsSync] SocketException: remote: (NONE):0 error: 9001 socket exception [RECV_ERROR] server [***.***.***.***:*****]
      2017-08-17T19:06:30.632+0800 I NETWORK [rsSync] DBClientCursor::init call() failed
      2017-08-17T19:06:30.640+0800 E REPL [rsSync] 13386 socket error for mapping query
      2017-08-17T19:06:30.640+0800 E REPL [rsSync] initial sync attempt failed, 9 attempts remaining
      2017-08-17T19:06:35.641+0800 I REPL [rsSync] initial sync pending
      2017-08-17T19:06:35.643+0800 I REPL [ReplicationExecutor] syncing from: <HOSTNAME>:*****
      2017-08-17T19:06:36.454+0800 I REPL [rsSync] initial sync drop all databases
      2017-08-17T19:06:36.454+0800 I STORAGE [rsSync] dropAllDatabasesExceptLocal 14
      2017-08-17T19:06:43.928+0800 I REPL [rsSync] initial sync clone all databases

      For MongoDB client, this can be resolved by set MaxConnectionIdleTime, but it seems there's no way to configure the same for replica sets, and hence Azure users (if not tweaking OS settings) will find it hard to sync data to another replica set out of Azure VM.

      Can we have an option to either specify max connection time for replica set, or make the initial sync not fail completely on a single connection failure?

        Attachments

          Activity

            People

            Assignee:
            ramon.fernandez Ramon Fernandez Marina
            Reporter:
            wekurtz WenniZ
            Participants:
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: