Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-4918

Lower replica set reader timeout (or make it configurable)

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Duplicate
    • Affects Version/s: 2.0.2
    • Fix Version/s: None
    • Component/s: Networking
    • Labels:
      None
    • Environment:
      Linux 2.6.32-220.4.1.el6.i686 #1 SMP Mon Jan 23 17:25:22 CST 2012 i686 i686 i386 GNU/Linux

      Description

      We are trying to simulate network split (partition) on a Mongo 2.0.2
      replica set consisting of three nodes. Basically we DROP all packets
      between PRIMARY and SLAVES.

      PRIMARY = lk-mm1
      SECONDARY1 = lk-mm2
      SECONDARY2 = lk-mm4

      On primary:

      iptables -A INPUT --src lk-mm4 -j DROP
      iptables -A OUTPUT --dst lk-mm4 -j DROP
      iptables -A INPUT --src lk-mm2 -j DROP
      iptables -A OUTPUT --dst lk-mm2 -j DROP

      The primary server correctly steps down, one of the secondaries
      becomes master.

      The problem is that the other secondary still tries to read oplog from
      the late PRIMARY. The timeout kicks-in after long ~15 minutes. Since we are using writeConcern=2, the replica set does not accept writes for quite a long time.

      Thu Feb  9 14:07:34 [rsSync] replSet syncing to: lk-mm1:27017
      Thu Feb  9 14:08:12 [rsHealthPoll] DBClientCursor::init call() failed
      Thu Feb  9 14:08:12 [rsHealthPoll] replSet info lk-mm1:27017 is down
      (or slow to respond): DBClientBase::findN: transport error: lk-mm1:27017 query: { replSetHeartbeat: "gdc", v: 3, pv: 1, checkEmpty: false, from: "lk-mm2:27017" }
      Thu Feb  9 14:08:12 [rsHealthPoll] replSet member lk-mm1:27017 is now in state DOWN
      Thu Feb  9 14:08:12 [rsMgr] not electing self, lk-mm4:27017 would veto
      Thu Feb  9 14:08:12 [conn597] replSet info voting yea for lk-mm4:27017 (2)
      Thu Feb  9 14:08:13 [rsHealthPoll] replSet member lk-mm4:27017 is now in state PRIMARY
      Thu Feb  9 14:08:24 [rsHealthPoll] couldn't connect to lk-mm1: couldn't connect to server lk-mm1:27017
      Thu Feb  9 14:11:34 [rsHealthPoll] couldn't connect to lk-mm1:27017: couldn't connect to server lk-mm1:27017 
      Thu Feb  9 14:14:44 [rsHealthPoll] couldn't connect to lk-mm1:27017: couldn't connect to server lk-mm1:27017
      Thu Feb  9 14:17:54 [rsHealthPoll] couldn't connect to lk-mm1:27017: couldn't connect to server lk-mm1:27017
      Thu Feb  9 14:21:04 [rsHealthPoll] couldn't connect to lk-mm1:27017: couldn't connect to server lk-mm1:27017
      Thu Feb  9 14:24:14 [rsHealthPoll] couldn't connect to lk-mm1:27017: couldn't connect to server lk-mm1:27017
      Thu Feb  9 14:24:15 [rsSync] Socket recv() errno:110 Connection timed out 10.244.123.13:27017
      Thu Feb  9 14:24:15 [rsSync] SocketException: remote: 10.244.123.13:27017 error: 9001 socket exception [1] server [10.244.123.13:27017]
      Thu Feb  9 14:24:15 [rsSync] Socket flush send() errno:32 Broken pipe 10.244.123.13:27017
      Thu Feb  9 14:24:15 [rsSync]   caught exception (socket exception) in destructor (~PiggyBackData)
      Thu Feb  9 14:24:15 [rsSync] replSet syncThread: 10278 dbclient error communicating with server: lk-mm1:27017
      Thu Feb  9 14:24:26 [rsSync] replSet syncing to: lk-mm4:27017 

      See the user group for mere details https://groups.google.com/group/mongodb-user/browse_thread/thread/935bdbd868d8ff1d

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              kristina Kristina Chodorow (Inactive)
              Reporter:
              dart0 Lukas Krecan
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: