Chained Replication problem when middle secondary under heavy read load

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Done
    • Priority: Major - P3
    • None
    • Affects Version/s: 3.0.5
    • Component/s: Replication
    • None
    • ALL
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None

      we have a replset with three data node: A(master) B(slave) C(slave) in LAN, no network problem, the replset chain is: A -> B -> C. when B is under heavy read load, C sync oplog from B failed with error "socket exception RECV_TIMEOUT", but B sync from A is normal, so node B's data is update to date. after some time we find that C lag from other a lot. then we use "rs.syncFrom(A)" to let C sync oplog from A, after a few second C is update with master.

      I expect that when C sync oplog from B failed, or failed some times, will switch to master node automaticly even if B is update to date. we will disable chained replication in replset config before this problem is solved.

      some info:

      1. db version: 3.0.5
      2. chainingAllowed: true(default)
      3. SlaveReplicationInfo

      repl1:SECONDARY> db.printSlaveReplicationInfo()
      source: NODE_B:2706
          syncedTo: Wed Dec 09 2015 10:43:22 GMT+0800 (CST)
          -1 secs (0 hrs) behind the primary
      source: NODE_C:2706
          syncedTo: Wed Dec 09 2015 10:34:35 GMT+0800 (CST)
          526 secs (0.15 hrs) behind the primary
      

      4. some sync log

      2015-12-09T10:29:01.877+0800 I NETWORK  [rsBackgroundSync] Socket recv() timeout  NODE_B:2706
      2015-12-09T10:29:01.905+0800 I NETWORK  [rsBackgroundSync] SocketException: remote: NODE_B:2706 error: 9001 socket exception [RECV_TIMEOUT] server [NODE_B:2706]
      2015-12-09T10:29:01.946+0800 E REPL     [rsBackgroundSync] sync producer problem: 10278 dbclient error communicating with server: NODE_B:2706
      2015-12-09T10:29:01.950+0800 I REPL     [ReplicationExecutor] syncing from: NODE_B:2706
      2015-12-09T10:34:42.673+0800 I NETWORK  [rsBackgroundSync] Socket recv() timeout  NODE_B:2706
      2015-12-09T10:34:42.696+0800 I NETWORK  [rsBackgroundSync] SocketException: remote: NODE_B:2706 error: 9001 socket exception [RECV_TIMEOUT] server [NODE_B:2706]
      2015-12-09T10:34:42.717+0800 E REPL     [rsBackgroundSync] sync producer problem: 10278 dbclient error communicating with server: NODE_B:2706
      2015-12-09T10:34:42.718+0800 I REPL     [ReplicationExecutor] syncing from: NODE_B:2706
      2015-12-09T10:34:44.641+0800 I REPL     [rsBackgroundSync] replSet syncTail condition 1
      2015-12-09T10:34:45.656+0800 I REPL     [ReplicationExecutor] syncing from: NODE_B:2706
      2015-12-09T10:35:45.853+0800 I NETWORK  [rsBackgroundSync] Socket recv() timeout  NODE_B:2706
      2015-12-09T10:35:45.853+0800 I NETWORK  [rsBackgroundSync] SocketException: remote: NODE_B:2706 error: 9001 socket exception [RECV_TIMEOUT] server [NODE_B:2706]
      2015-12-09T10:35:45.853+0800 E REPL     [rsBackgroundSync] sync producer problem: 10278 dbclient error communicating with server: NODE_B:2706
      2015-12-09T10:35:45.853+0800 I REPL     [ReplicationExecutor] syncing from: NODE_B:2706
      2015-12-09T10:36:16.721+0800 I NETWORK  [rsBackgroundSync] Socket recv() timeout  NODE_B:2706
      2015-12-09T10:36:16.721+0800 I NETWORK  [rsBackgroundSync] SocketException: remote: NODE_B:2706 error: 9001 socket exception [RECV_TIMEOUT] server [NODE_B:2706]
      2015-12-09T10:36:16.721+0800 E REPL     [rsBackgroundSync] sync producer problem: 10278 dbclient error communicating with server: NODE_B:2706
      2015-12-09T10:36:16.721+0800 I REPL     [ReplicationExecutor] syncing from: NODE_B:2706
      2015-12-09T10:37:16.933+0800 I NETWORK  [rsBackgroundSync] Socket recv() timeout  NODE_B:2706
      2015-12-09T10:37:16.933+0800 I NETWORK  [rsBackgroundSync] SocketException: remote: NODE_B:2706 error: 9001 socket exception [RECV_TIMEOUT] server [NODE_B:2706]
      2015-12-09T10:37:16.933+0800 E REPL     [rsBackgroundSync] sync producer problem: 10278 dbclient error communicating with server: NODE_B:2706
      2015-12-09T10:37:16.933+0800 I REPL     [ReplicationExecutor] syncing from: NODE_B:2706
      2015-12-09T10:37:47.189+0800 I NETWORK  [rsBackgroundSync] Socket recv() timeout  NODE_B:2706
      2015-12-09T10:37:47.189+0800 I NETWORK  [rsBackgroundSync] SocketException: remote: NODE_B:2706 error: 9001 socket exception [RECV_TIMEOUT] server [NODE_B:2706]
      2015-12-09T10:37:47.189+0800 E REPL     [rsBackgroundSync] sync producer problem: 10278 dbclient error communicating with server: NODE_B:2706
      2015-12-09T10:37:47.189+0800 I REPL     [ReplicationExecutor] syncing from: NODE_B:2706
      2015-12-09T10:37:49.456+0800 I REPL     [rsBackgroundSync] replSet syncTail condition 1
      2015-12-09T10:37:50.457+0800 I REPL     [ReplicationExecutor] syncing from: NODE_B:2706
      2015-12-09T10:43:44.201+0800 I NETWORK  [rsBackgroundSync] Socket recv() timeout  NODE_B:2706
      2015-12-09T10:43:44.201+0800 I NETWORK  [rsBackgroundSync] SocketException: remote: NODE_B:2706 error: 9001 socket exception [RECV_TIMEOUT] server [NODE_B:2706]
      2015-12-09T10:43:44.201+0800 E REPL     [rsBackgroundSync] sync producer problem: 10278 dbclient error communicating with server: NODE_B:2706
      2015-12-09T10:43:44.201+0800 I REPL     [ReplicationExecutor] syncing from: NODE_B:2706
      2015-12-09T10:44:07.570+0800 I REPL     [ReplicationExecutor] syncing from: NODE_A:2706 by request
      2015-12-09T10:44:07.580+0800 I REPL     [SyncSourceFeedback] replset setting syncSourceFeedback to NODE_A:2706
      

            Assignee:
            Kelsey Schubert
            Reporter:
            ma6174
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: