[SERVER-2544] Two primaries with network partitioned replica set (non-transient) Created: 13/Feb/11  Updated: 12/Jul/16  Resolved: 14/Feb/11

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 1.6.5, 1.7.5
Fix Version/s: 1.7.6

Type: Bug Priority: Major - P3
Reporter: Sam Bryan Assignee: Kristina Chodorow (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu 9.04 (32bit), Ubuntu 10.04.1 (32bit)


Operating System: Linux
Participants:

 Description   

Take three hosts in a replica set:

config = {_id: 'test1', members: [
{_id: 0, host: 'sf1'},
{_id: 1, host: 'ny1'},
{_id: 2, host: 'uk1'}]
}

sf1 is master.

A 'routing issue' occurs ( root@uk1:~# route add -host sf1 reject ), such that:

sf1 can talk to ny1.
ny1 can talk to uk1.
sf1 cannot talk to uk1.

sf1 notices uk1 has gone quiet, and remains a master. (it's a master, it can see a majority, so that's reasonable)
uk1 votes for itself. (it can see a majority, but no master, so that's also reasonable)
ny1 votes for uk1. (that's probably less sensible, given that it can already see a master)
ny1 then bemoans that fact that there are two primaries.

Log entries:

sf1:
Thu Feb 10 17:29:39 [conn2] end connection uk1:35740
Thu Feb 10 17:29:57 [ReplSetHealthPollTask] replSet info uk1 is now down (or slow to respond)

ny1:
Thu Feb 10 17:29:37 [conn4] replSet info voting yea for 2
Thu Feb 10 17:29:39 [ReplSetHealthPollTask] replSet uk1 PRIMARY
Thu Feb 10 17:29:39 [rs Manager] replSet warning DIAG two primaries (transiently)
Thu Feb 10 17:29:45 [rs Manager] replSet warning DIAG two primaries (transiently)
Thu Feb 10 17:29:51 [rs Manager] replSet warning DIAG two primaries (transiently)
(previous message continues to repeat - the situation doesn't resolve until uk1 is un-partitioned again)

uk1:
Thu Feb 10 17:29:37 [ReplSetHealthPollTask] replSet info sf1 is now down (or slow to respond)
Thu Feb 10 17:29:37 [rs Manager] replSet info electSelf 2
Thu Feb 10 17:29:37 [rs Manager] replSet PRIMARY

The impact of this is probably rather mitigated in the real world, as if I repeat this scenario with frequent writes onto sf1, uk1 when partitioned in this way will correctly detect that it's not current ("[rs Manager] replSet info not electing self, we are not freshest").

Relates to forum discussion: http://groups.google.com/group/mongodb-user/browse_thread/thread/b2f01c106f7b6841



 Comments   
Comment by auto [ 13/Feb/11 ]

Author:

{u'login': u'kchodorow', u'name': u'Kristina', u'email': u'kristina@10gen.com'}

Message: eliminate two-primary edge case SERVER-2544
https://github.com/mongodb/mongo/commit/ed360a84f92d3c234ca9a4c2256de95be5562abe

Generated at Thu Feb 08 03:00:21 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.