-
Type: Bug
-
Resolution: Incomplete
-
Priority: Critical - P2
-
None
-
Affects Version/s: 1.8.1
-
Component/s: Replication
-
Labels:None
-
Environment:Linux
-
ALL
The following errors in the secondary log indicate that it had trouble accessing either of the other DBs:
Tue Jun 21 08:58:28 [ReplSetHealthPollTask] DBClientCursor::init call() failed Tue Jun 21 08:58:28 [ReplSetHealthPollTask] replSet info prod-c0-pacmandb2 is down (or slow to respond): DBClientBase::findOne: transport error: prod-c0-pacmandb2 query: { replSetHeartbeat: "pacman", v: 2, pv: 1, checkEmpty: false, from: "lab-c0-pacmandb1.lab" } Tue Jun 21 08:58:30 [ReplSetHealthPollTask] DBClientCursor::init call() failed Tue Jun 21 08:58:30 [ReplSetHealthPollTask] replSet info prod-c0-pacmandb1 is down (or slow to respond): DBClientBase::findOne: transport error: prod-c0-pacmandb1 query: { replSetHeartbeat: "pacman", v: 2, pv: 1, checkEmpty: false, from: "lab-c0-pacmandb1.lab" } Tue Jun 21 08:59:33 [ReplSetHealthPollTask] replSet info prod-c0-pacmandb2 is up Tue Jun 21 08:59:34 [initandlisten] connection accepted from 10.10.***.***:54941 #1492 Tue Jun 21 08:59:34 [initandlisten] connection accepted from 10.10.***.***:33786 #1493 Tue Jun 21 08:59:36 [ReplSetHealthPollTask] replSet info prod-c0-pacmandb1 is up
It failed to replicate for over an hour, and only a restart of the secondary DB seems to have fixed the problem. This was not a master log corruption issue because the other secondary was syncing just fine.