Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-9351

3 node replica set fresh config - failure after initial mongoimport

    XMLWordPrintableJSON

Details

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major - P3 Major - P3
    • None
    • 2.4.1
    • Replication
    • None
    • Ubuntu 12.04.2 LTS 3.2.0-40-virtual, 64-bit, hosted on AWS EC2
    • ALL
    • Hide

      1) create and initiate clean 3-node replica set cluster.

      • place two nodes in one AZ (with sub 1msec latency and near 1gige networking)
      • place third node into another AWS AZ, 2msec away with 3 hops, via VPN.
        4) mongoimport to {NODE 1}

        5) wait for replication on

        {NODE 3}

        to fail.

      Show
      1) create and initiate clean 3-node replica set cluster. place two nodes in one AZ (with sub 1msec latency and near 1gige networking) place third node into another AWS AZ, 2msec away with 3 hops, via VPN. 4) mongoimport to {NODE 1} 5) wait for replication on {NODE 3} to fail.

    Description

      After setting up replication as per architecture design pattern "Geographically Distributed Sets" (2 nodes in one AZ, 1 node in another AZ, via VPN, as per Amazon recommended design), performing a fresh import on NODE 1 (client) to NODE 1 (server) triggers replication issues.

      NODE 1 - primary, AZ2 (availability zone)
      NODE 2 - secondary, AZ2
      NODE 3 - secondary, AZ1

      PROBLEM
      ----------
      replication "locks" up on

      {NODE 3} and does not recover, either by waiting or restarting mongodb server {NODE 3}

      .

      mongo client on

      {NODE 3} responds very slowly (up to 30 seconds lag), even on enter with no command.

      Error logs:
      -------------
      Mon Apr 15 08:02:55.026 [rsBackgroundSync] Socket recv() timeout {NODE 1}
      Mon Apr 15 08:02:55.026 [rsBackgroundSync] SocketException: remote: {NODE 1} error: 9001 socket exception [3] server [{NODE 1}]
      Mon Apr 15 08:02:55.026 [rsBackgroundSync] replSet db exception in producer: 10278 dbclient error communicating with server: {NODE 1}
      Mon Apr 15 08:02:56.050 [rsSyncNotifier] Socket recv() timeout {NODE 1}
      Mon Apr 15 08:02:56.050 [rsSyncNotifier] SocketException: remote: {NODE 1} error: 9001 socket exception [3] server [{NODE 1}]
      Mon Apr 15 08:02:56.050 [rsSyncNotifier] DBClientCursor::init call() failed
      Mon Apr 15 08:02:57.050 [rsSyncNotifier] replset tracking exception: exception: 9001 socket exception [FAILED_STATE] for {NODE 1}
      Mon Apr 15 08:02:58.051 [rsSyncNotifier] replset setting oplog notifier to {NODE 1}

      replication status
      --------------------{NODE 1} state - PRIMARY, optime - 1366013200 {NODE 2} state - SECONDARY, optime - 1366013200{NODE 3}

      state - SECONDARY, optime - 1366012945

      Attachments

        Activity

          People

            Unassigned Unassigned
            dsobon David Sobon
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: