|
Notes:
Success:
d1c2ae72c482 Linux 64-bit DEBUG DUR OFF replicasets
Attachment: linux_64_debug_duroff_1340_test_replicasets_0_initial_sync1_success.txt
Failure:
9fcdfeb195c4 Linux 64-bit DEBUG DUR OFF replicasets
Attachment: linux_64_debug_duroff_1339_test_replicasets_0_initial_sync1_fail.txt
Observations:
9. Bring #2 back up
|
ReplSetTest n is : 1
|
m31000| 2013-10-14T21:28:48.147+0000 [rsHealthPoll] DBClientCursor::init call() failed
|
m31000| 2013-10-14T21:28:48.147+0000 [rsHealthPoll] dev: lastError==0 won't report:DBClientBase::findN: transport error: ip-10-142-147-116:31001 ns: admin.$cmd query: { replSetHeartbeat: "jstests_initsync1", v: 2, pv: 1, checkEmpty: false, from: "ip-10-142-147-116:31000" }
|
m31000| 2013-10-14T21:28:48.147+0000 [rsHealthPoll] replset info ip-10-142-147-116:31001 heartbeat failed, retrying
|
m31000| 2013-10-14T21:28:48.148+0000 [rsHealthPoll] warning: Failed to connect to 10.142.147.116, reason: errno:111 Connection refused
|
m31000| 2013-10-14T21:28:48.148+0000 [rsHealthPoll] unable to set SO_RCVTIMEO
|
m31000| 2013-10-14T21:28:48.149+0000 [rsHealthPoll] unable to set SO_SNDTIMEO
|
m31000| 2013-10-14T21:28:48.149+0000 [rsHealthPoll] replSet info ip-10-142-147-116:31001 is down (or slow to respond):
|
m31000| 2013-10-14T21:28:48.149+0000 [rsHealthPoll] replSet member ip-10-142-147-116:31001 is now in state DOWN
|
m31000| 2013-10-14T21:28:48.946+0000 [rsHealthPoll] replSet member ip-10-142-147-116:31002 is now in state SECONDARY
|
m31000| 2013-10-14T21:28:49.394+0000 [conn5] end connection 10.142.147.116:58325 (4 connections now open)
|
m31000| 2013-10-14T21:28:50.149+0000 [rsHealthPoll] unable to set SO_RCVTIMEO
|
m31000| 2013-10-14T21:28:50.149+0000 [rsHealthPoll] unable to set SO_SNDTIMEO
|
m31000| 2013-10-14T21:28:50.151+0000 [rsHealthPoll] warning: Failed to connect to 10.142.147.116, reason: errno:111 Connection refused
|
m31000| 2013-10-14T21:28:50.151+0000 [rsHealthPoll] replset info ip-10-142-147-116:31001 heartbeat failed, retrying
|
ReplSetTest n: 1 ports: [ 31000, 31001 ] 31001 number
|
m31000| 2013-10-14T21:28:50.151+0000 [rsHealthPoll] unable to set SO_RCVTIMEO
|
m31000| 2013-10-14T21:28:50.151+0000 [rsHealthPoll] unable to set SO_SNDTIMEO
|
m31000| 2013-10-14T21:28:50.156+0000 [rsHealthPoll] warning: Failed to connect to 10.142.147.116, reason: errno:111 Connection refused
|
m31000| 2013-10-14T21:28:50.156+0000 [rsHealthPoll] unable to set SO_RCVTIMEO
|
m31000| 2013-10-14T21:28:50.156+0000 [rsHealthPoll] unable to set SO_SNDTIMEO
|
m31000| 2013-10-14T21:28:52.156+0000 [rsHealthPoll] unable to set SO_RCVTIMEO
|
The test is supposed to kill the node on port 31001 and bring it back up, but we start to see a lot of socket errors after we bring the node up. We also see from the replica set status output that the node is up, but can't see the other two nodes:
{
|
"set" : "jstests_initsync1",
|
"date" : ISODate("2013-10-14T21:37:10Z"),
|
"myState" : 2,
|
"members" : [
|
{
|
"_id" : 0,
|
"name" : "ip-10-142-147-116:31000",
|
"health" : 0,
|
"state" : 8,
|
"stateStr" : "(not reachable/healthy)",
|
"uptime" : 0,
|
"optime" : Timestamp(0, 0),
|
"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
|
"lastHeartbeat" : ISODate("2013-10-14T21:37:03Z"),
|
"lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
|
"pingMs" : 0
|
},
|
{
|
"_id" : 1,
|
"name" : "ip-10-142-147-116:31001",
|
"health" : 1,
|
"state" : 2,
|
"stateStr" : "SECONDARY",
|
"uptime" : 282,
|
"optime" : Timestamp(1381786108, 1),
|
"optimeDate" : ISODate("2013-10-14T21:28:28Z"),
|
"self" : true
|
},
|
{
|
"_id" : 2,
|
"name" : "ip-10-142-147-116:31002",
|
"health" : 0,
|
"state" : 8,
|
"stateStr" : "(not reachable/healthy)",
|
"uptime" : 0,
|
"optime" : Timestamp(0, 0),
|
"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
|
"lastHeartbeat" : ISODate("2013-10-14T21:37:05Z"),
|
"lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
|
"pingMs" : 0
|
}
|
],
|
"ok" : 1
|
}
|
We don't see either of these symptoms in the successful test.
|