Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-12163

Replica Set failover time is more than 10 sec

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Critical - P2 Critical - P2
    • None
    • Affects Version/s: 2.4.6
    • Component/s: Replication
    • None
    • Environment:
      CentOS
    • Linux
    • Hide

      Power the primary DB VM and verify the secondary DB logs.

      Show
      Power the primary DB VM and verify the secondary DB logs.

      We have three member replica set primary+secondary+arbiter across different virtual machines.

      To reproduced the issue we power the primary DB. We have observed that failover time is more than 10 secs.
      What is optimal failover time. Is there any tuning parameter to reduce the failover time?

      Thu Dec 19 02:15:55.994 [rsSyncNotifier] replset setting oplog notifier to sessionmgr01:27717
      Thu Dec 19 02:19:09.121 [rsHealthPoll] DBClientCursor::init call() failed
      Thu Dec 19 02:19:09.121 [rsHealthPoll] replSet info sessionmgr01:27717 is down (or slow to respond):
      Thu Dec 19 02:19:09.121 [rsHealthPoll] replSet member sessionmgr01:27717 is now in state DOWN
      Thu Dec 19 02:19:09.122 [rsMgr] replSet info electSelf 2
      Thu Dec 19 02:19:17.124 [rsHealthPoll] replset info sessionmgr01:27717 heartbeat failed, retrying
      Thu Dec 19 02:19:28.071 [rsBackgroundSync] Socket recv() timeout 192.168.92.59:27717
      Thu Dec 19 02:19:28.071 [rsBackgroundSync] SocketException: remote: 192.168.92.59:27717 error: 9001 socket exception [RECV_TIMEOUT] server [192.168.92.59:27717]
      Thu Dec 19 02:19:28.072 [rsBackgroundSync] replSet sync source problem: 10278 dbclient error communicating with server: sessionmgr01:27717
      Thu Dec 19 02:19:28.072 [rsSyncNotifier] Socket recv() timeout 192.168.92.59:27717
      Thu Dec 19 02:19:28.072 [rsSyncNotifier] SocketException: remote: 192.168.92.59:27717 error: 9001 socket exception [RECV_TIMEOUT] server [192.168.92.59:27717]
      Thu Dec 19 02:19:28.072 [rsSyncNotifier] replset tracking exception: exception: 10278 dbclient error communicating with server: sessionmgr01:27717
      Thu Dec 19 02:19:28.072 [rsMgr] replSet PRIMARY
      Thu Dec 19 02:19:29.127 [rsHealthPoll] replset info sessionmgr01:27717 heartbeat failed, retrying

            Assignee:
            matt.dannenberg Matt Dannenberg
            Reporter:
            amwankhe@cisco.com Amit Wankhede
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: