Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-2544

Two primaries with network partitioned replica set (non-transient)

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 1.7.6
    • Affects Version/s: 1.6.5, 1.7.5
    • Component/s: None
    • Labels:
      None
    • Environment:
      Ubuntu 9.04 (32bit), Ubuntu 10.04.1 (32bit)
    • Linux

      Take three hosts in a replica set:

      config = {_id: 'test1', members: [
      {_id: 0, host: 'sf1'},
      {_id: 1, host: 'ny1'},
      {_id: 2, host: 'uk1'}]
      }

      sf1 is master.

      A 'routing issue' occurs ( root@uk1:~# route add -host sf1 reject ), such that:

      sf1 can talk to ny1.
      ny1 can talk to uk1.
      sf1 cannot talk to uk1.

      sf1 notices uk1 has gone quiet, and remains a master. (it's a master, it can see a majority, so that's reasonable)
      uk1 votes for itself. (it can see a majority, but no master, so that's also reasonable)
      ny1 votes for uk1. (that's probably less sensible, given that it can already see a master)
      ny1 then bemoans that fact that there are two primaries.

      Log entries:

      sf1:
      Thu Feb 10 17:29:39 [conn2] end connection uk1:35740
      Thu Feb 10 17:29:57 [ReplSetHealthPollTask] replSet info uk1 is now down (or slow to respond)

      ny1:
      Thu Feb 10 17:29:37 [conn4] replSet info voting yea for 2
      Thu Feb 10 17:29:39 [ReplSetHealthPollTask] replSet uk1 PRIMARY
      Thu Feb 10 17:29:39 [rs Manager] replSet warning DIAG two primaries (transiently)
      Thu Feb 10 17:29:45 [rs Manager] replSet warning DIAG two primaries (transiently)
      Thu Feb 10 17:29:51 [rs Manager] replSet warning DIAG two primaries (transiently)
      (previous message continues to repeat - the situation doesn't resolve until uk1 is un-partitioned again)

      uk1:
      Thu Feb 10 17:29:37 [ReplSetHealthPollTask] replSet info sf1 is now down (or slow to respond)
      Thu Feb 10 17:29:37 [rs Manager] replSet info electSelf 2
      Thu Feb 10 17:29:37 [rs Manager] replSet PRIMARY

      The impact of this is probably rather mitigated in the real world, as if I repeat this scenario with frequent writes onto sf1, uk1 when partitioned in this way will correctly detect that it's not current ("[rs Manager] replSet info not electing self, we are not freshest").

      Relates to forum discussion: http://groups.google.com/group/mongodb-user/browse_thread/thread/b2f01c106f7b6841

            Assignee:
            kristina Kristina Chodorow (Inactive)
            Reporter:
            tabascoterrier Sam Bryan
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: