Details
-
Bug
-
Resolution: Done
-
Critical - P2
-
None
-
None
-
Replication
-
ALL
Description
This results in liveness property of leader election being lost, i.e. a new master is never elected.
Relatively easy way to trigger:
1) Spawn a replicaset locally, replicate.py in https://github.com/dcci/mongo-replication-perf can be used for this.
2) Once a primary is elected, drop all incoming local connections directed to it. Assuming the primary listening on 30001 this should be enough (on Linux, or whatever flavour of *NIX that supports iptables).
# iptables -A INPUT -j DROP -p tcp -i lo --destination-port 30001
|
# iptables -A INPUT -j DROP -p tcp --destination-port 30001
|
Secondaries still receive heartbeats from primary so they don't change, as the log says.
2014-06-09T11:45:47.792-0700 [rsHealthPoll] warning: Failed to connect to 127.0.0.1:30001 after 5000 milliseconds, giving up.
|
2014-06-09T11:45:47.792-0700 [rsHealthPoll] replset info localhost:30001 heartbeat failed, retrying
|
2014-06-09T11:45:50.698-0700 [rsBackgroundSync] replSet not trying to sync from localhost:30001, it is vetoed for 5 more seconds
|
2014-06-09T11:45:50.698-0700 [rsBackgroundSync] replSet not trying to sync from localhost:30001, it is vetoed for 5 more seconds
|
2014-06-09T11:45:52.793-0700 [rsHealthPoll] warning: Failed to connect to 127.0.0.1:30001, reason: errno:115 Operation now in progress
|
2014-06-09T11:45:52.793-0700 [rsHealthPoll] replset info localhost:30001 just heartbeated us, but our heartbeat failed: , not changing state
|
2014-06-09T11:45:55.698-0700 [rsBackgroundSync] replSet not trying to sync from localhost:30001, it is vetoed for 0 more seconds
|
2014-06-09T11:45:55.698-0700 [rsBackgroundSync] replSet not trying to sync from localhost:30001, it is vetoed for 0 more seconds
|
2014-06-09T11:45:59.069-0700 [conn46] end connection 127.0.0.1:52528 (1 connection now open)
|
2014-06-09T11:45:59.069-0700 [initandlisten] connection accepted from 127.0.0.1:52545 #48 (2 connections now open)
|
2014-06-09T11:45:59.835-0700 [rsHealthPoll] warning: Failed to connect to 127.0.0.1:30001 after 5000 milliseconds, giving up.
|