Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Incomplete
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Replication
Labels:
- elections

Operating System:
ALL
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Setup is this:

Replica set with 4 nodes, priority 0 except for the first node A (only the first node can be primary).

Nodes B and C slaveDelayed by 0 or 40s, alternating via reconfigs.

Node D blackholed from node A, symmetrically (A can't talk to D, D can't talk to A).

At first node D correctly switches sync'ing between nodes A and B, depending on which is delayed. Each time the reconfig happens node A drops to secondary, then is elected primary.

At some point though it seems impossible for node A to become the primary again after a reconfig. There is a strange message in the logs of node A:

 m31000| Thu Jan 17 17:05:00.147 [rsMgr] not electing self, 127.0.0.4:31002 would veto with '127.0.0.2:31000 is trying to elect itself but 127.0.0.2:31000 is already primary and more up-to-date'

Test to reproduce and output from two runs is attached below (with replSetStatus from all nodes every 5s during the problem period).

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

currentTest_failure_same_host_veto.txt
177 kB
Jan 17 2013 10:15:27 PM UTC
currentTest.txt
569 kB
Jan 17 2013 10:15:27 PM UTC
sync_change_source.js
3 kB
Jan 17 2013 10:15:27 PM UTC

Assignee:: Davide Italiano (Inactive)
Reporter:: Greg Studer (Inactive)
Participants:: Davide Italiano, Eric Milkie, Greg Studer, Kristina Chodorow
Votes:: 0 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: Jan 17 2013 10:13:43 PM UTC
Updated:: Jan 29 2015 07:34:21 PM UTC
Resolved:: Jan 29 2015 07:34:21 PM UTC

Details

Description

Attachments

Attachments

Forms

Activity

People

Dates