Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-4810

If a replica set member with higher priority comes online, current primary relinquishes primary state

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 2.0.2
    • Fix Version/s: 2.0.5, 2.1.1
    • Component/s: Replication
    • Labels:
      None
    • Environment:
      Ubuntu 10.04 LTS x64
      Mac OS X 10.7
    • Operating System:
      ALL

      Description

      If a replica set member with higher priority comes online, the current primary relinquishes primary state regardless of the state of the other member and before that member has any status. This means a functioning cluster will lose state and require an election just because another member came online with a higher priority. This does not happen if there is no priority set for any member.

      1. Set up a replica set with 3 nodes, 2 of them with high priorities:

      {
       "_id" : "test1",
       "members" : [
       {
       "_id" : 0,
       "host" : "localhost:27017",
       "priority" : 3
       },
       {
       "_id" : 1,
       "host" : "localhost:27018",
       "priority" : 2
       },
       {
       "_id" : 2,
       "host" : "localhost:27019"
       },
       ]
      }

      2. Wait for the set to come online with "localhost:27017" as the primary
      3. On "localhost:27017" issue rs.stepDown() so that "localhost:27018" becomes primary
      4. Kill mongod on "localhost:27017"
      5. Delete the data directory on "localhost:27017" so that when it comes up it is not immediately ready to take over as primary and requires a resync.
      5. Restart mongod on "localhost:27017"

      What happens: "localhost:27018" immediately loses its primary state then gets reelected as primary
      What should happen: "localhost:27018" should remain primary until it is safe to re-elect "localhost:27017" as the higher priority node

      Log from "localhost:27018"

      Mon Jan 30 15:50:17 [rsHealthPoll] DBClientCursor::init call() failed
      Mon Jan 30 15:50:17 [rsHealthPoll] replSet info localhost:27017 is down (or slow to respond): DBClientBase::findN: transport error: localhost:27017 query: { replSetHeartbeat: "test1", v: 1, pv: 1, checkEmpty: false, from: "localhost:27018" }
      Mon Jan 30 15:50:17 [rsHealthPoll] replSet member localhost:27017 is now in state DOWN
      Mon Jan 30 15:50:33 [rsHealthPoll] replSet member localhost:27017 is up
      Mon Jan 30 15:50:33 [rsMgr] stepping down localhost:27018
      Mon Jan 30 15:50:33 [rsMgr] replSet relinquishing primary state
      Mon Jan 30 15:50:33 [rsMgr] replSet SECONDARY
      Mon Jan 30 15:50:33 [rsMgr] replSet closing client sockets after reqlinquishing primary
      Mon Jan 30 15:50:33 [conn1] end connection 127.0.0.1:57612
      Mon Jan 30 15:50:33 [rsHealthPoll] replSet info localhost:27019 is down (or slow to respond): socket exception
      Mon Jan 30 15:50:33 [rsHealthPoll] replSet member localhost:27019 is now in state DOWN
      Mon Jan 30 15:50:33 [rsMgr] replSet not electing self, not all members up and we have been up less than 5 minutes
      Mon Jan 30 15:50:35 [conn12] SocketException handling request, closing client connection: 9001 socket exception [2] server [127.0.0.1:57654] 
      Mon Jan 30 15:50:35 [rsHealthPoll] replSet member localhost:27019 is up
      Mon Jan 30 15:50:35 [rsHealthPoll] replSet member localhost:27019 is now in state SECONDARY
      Mon Jan 30 15:50:35 [rsMgr] not electing self, localhost:27019 would veto
      Mon Jan 30 15:50:37 [rsMgr] not electing self, localhost:27019 would veto
      Mon Jan 30 15:50:40 [conn10] end connection 127.0.0.1:57650
      Mon Jan 30 15:50:40 [initandlisten] connection accepted from 127.0.0.1:57680 #13
      Mon Jan 30 15:50:41 [rsMgr] not electing self, localhost:27019 would veto
      Mon Jan 30 15:50:43 [initandlisten] connection accepted from 127.0.0.1:57683 #14
      Mon Jan 30 15:50:46 [rsHealthPoll] replSet member localhost:27017 is now in state STARTUP2
      Mon Jan 30 15:50:46 [rsMgr] not electing self, localhost:27017 would veto
      Mon Jan 30 15:50:46 [rsMgr] not electing self, localhost:27017 would veto
      Mon Jan 30 15:50:52 [rsMgr] replSet info electSelf 1
      Mon Jan 30 15:50:52 [rsMgr] replSet PRIMARY
      Mon Jan 30 15:50:54 [rsHealthPoll] replSet member localhost:27017 is now in state RECOVERING
      Mon Jan 30 15:50:54 [initandlisten] connection accepted from 127.0.0.1:57687 #15
      Mon Jan 30 15:50:59 [conn14] end connection 127.0.0.1:57683

        Attachments

          Activity

            People

            Assignee:
            kristina Kristina Chodorow
            Reporter:
            boxedice David Mytton
            Participants:
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: