Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-4918

Lower replica set reader timeout (or make it configurable)

    • Type: Icon: Improvement Improvement
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 2.0.2
    • Component/s: Networking
    • None
    • Environment:
      Linux 2.6.32-220.4.1.el6.i686 #1 SMP Mon Jan 23 17:25:22 CST 2012 i686 i686 i386 GNU/Linux

      We are trying to simulate network split (partition) on a Mongo 2.0.2
      replica set consisting of three nodes. Basically we DROP all packets
      between PRIMARY and SLAVES.

      PRIMARY = lk-mm1
      SECONDARY1 = lk-mm2
      SECONDARY2 = lk-mm4
      

      On primary:

      iptables -A INPUT --src lk-mm4 -j DROP
      iptables -A OUTPUT --dst lk-mm4 -j DROP
      iptables -A INPUT --src lk-mm2 -j DROP
      iptables -A OUTPUT --dst lk-mm2 -j DROP
      

      The primary server correctly steps down, one of the secondaries
      becomes master.

      The problem is that the other secondary still tries to read oplog from
      the late PRIMARY. The timeout kicks-in after long ~15 minutes. Since we are using writeConcern=2, the replica set does not accept writes for quite a long time.

      Thu Feb  9 14:07:34 [rsSync] replSet syncing to: lk-mm1:27017
      Thu Feb  9 14:08:12 [rsHealthPoll] DBClientCursor::init call() failed
      Thu Feb  9 14:08:12 [rsHealthPoll] replSet info lk-mm1:27017 is down
      (or slow to respond): DBClientBase::findN: transport error: lk-mm1:27017 query: { replSetHeartbeat: "gdc", v: 3, pv: 1, checkEmpty: false, from: "lk-mm2:27017" }
      Thu Feb  9 14:08:12 [rsHealthPoll] replSet member lk-mm1:27017 is now in state DOWN
      Thu Feb  9 14:08:12 [rsMgr] not electing self, lk-mm4:27017 would veto
      Thu Feb  9 14:08:12 [conn597] replSet info voting yea for lk-mm4:27017 (2)
      Thu Feb  9 14:08:13 [rsHealthPoll] replSet member lk-mm4:27017 is now in state PRIMARY
      Thu Feb  9 14:08:24 [rsHealthPoll] couldn't connect to lk-mm1: couldn't connect to server lk-mm1:27017
      Thu Feb  9 14:11:34 [rsHealthPoll] couldn't connect to lk-mm1:27017: couldn't connect to server lk-mm1:27017 
      Thu Feb  9 14:14:44 [rsHealthPoll] couldn't connect to lk-mm1:27017: couldn't connect to server lk-mm1:27017
      Thu Feb  9 14:17:54 [rsHealthPoll] couldn't connect to lk-mm1:27017: couldn't connect to server lk-mm1:27017
      Thu Feb  9 14:21:04 [rsHealthPoll] couldn't connect to lk-mm1:27017: couldn't connect to server lk-mm1:27017
      Thu Feb  9 14:24:14 [rsHealthPoll] couldn't connect to lk-mm1:27017: couldn't connect to server lk-mm1:27017
      Thu Feb  9 14:24:15 [rsSync] Socket recv() errno:110 Connection timed out 10.244.123.13:27017
      Thu Feb  9 14:24:15 [rsSync] SocketException: remote: 10.244.123.13:27017 error: 9001 socket exception [1] server [10.244.123.13:27017]
      Thu Feb  9 14:24:15 [rsSync] Socket flush send() errno:32 Broken pipe 10.244.123.13:27017
      Thu Feb  9 14:24:15 [rsSync]   caught exception (socket exception) in destructor (~PiggyBackData)
      Thu Feb  9 14:24:15 [rsSync] replSet syncThread: 10278 dbclient error communicating with server: lk-mm1:27017
      Thu Feb  9 14:24:26 [rsSync] replSet syncing to: lk-mm4:27017 
      

      See the user group for mere details https://groups.google.com/group/mongodb-user/browse_thread/thread/935bdbd868d8ff1d

        1. timeout-patch.patch
          0.9 kB
          Lukas Krecan

            Assignee:
            kristina Kristina Chodorow (Inactive)
            Reporter:
            dart0 Lukas Krecan
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: