Details
-
Bug
-
Resolution: Incomplete
-
Critical - P2
-
None
-
1.8.1
-
None
-
Linux
-
ALL
Description
The following errors in the secondary log indicate that it had trouble accessing either of the other DBs:
Tue Jun 21 08:58:28 [ReplSetHealthPollTask] DBClientCursor::init call() failed
|
Tue Jun 21 08:58:28 [ReplSetHealthPollTask] replSet info prod-c0-pacmandb2 is down (or slow to respond): DBClientBase::findOne: transport error: prod-c0-pacmandb2 query: { replSetHeartbeat: "pacman", v: 2, pv: 1, checkEmpty: false, from: "lab-c0-pacmandb1.lab" }
|
Tue Jun 21 08:58:30 [ReplSetHealthPollTask] DBClientCursor::init call() failed
|
Tue Jun 21 08:58:30 [ReplSetHealthPollTask] replSet info prod-c0-pacmandb1 is down (or slow to respond): DBClientBase::findOne: transport error: prod-c0-pacmandb1 query: { replSetHeartbeat: "pacman", v: 2, pv: 1, checkEmpty: false, from: "lab-c0-pacmandb1.lab" }
|
Tue Jun 21 08:59:33 [ReplSetHealthPollTask] replSet info prod-c0-pacmandb2 is up
|
Tue Jun 21 08:59:34 [initandlisten] connection accepted from 10.10.***.***:54941 #1492
|
Tue Jun 21 08:59:34 [initandlisten] connection accepted from 10.10.***.***:33786 #1493
|
Tue Jun 21 08:59:36 [ReplSetHealthPollTask] replSet info prod-c0-pacmandb1 is up
|
It failed to replicate for over an hour, and only a restart of the secondary DB seems to have fixed the problem. This was not a master log corruption issue because the other secondary was syncing just fine.