[SERVER-3723] 1.8.3 failover delayed 3 minutes when arbiter 2.0 Created: 30/Aug/11  Updated: 29/Feb/12  Resolved: 30/Aug/11

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.0.0-rc0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Tony Hannan Assignee: Kristina Chodorow (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

2 member replica set running version 1.8.3 + arbiter running version 2.0.
Stop primary.

Problem: Secondary does not become primary until 3 minutes later (see log below).

No problem if all 1.8.3.
No problem if all 2.0.
No problem if members 2.0 and arbiter 1.8.3.

Secondary log (primary was 10.182.38.10):
Tue Aug 30 17:52:38 [replica set sync] replSet syncThread: 10278 dbclient error communicating with server: 10.182.38.10:27017
Tue Aug 30 17:52:38 [conn5] end connection 10.182.38.10:34232
Tue Aug 30 17:52:40 [ReplSetHealthPollTask] DBClientCursor::init call() failed
Tue Aug 30 17:52:40 [ReplSetHealthPollTask] replSet info 10.182.38.10:27017 is down (or slow to respond): DBClientBase::findOne: transport error: 10.182.38.10:27017 query:

{ replSetHeartbeat: "rsA", v: 2, pv: 1, checkEmpty: false, from: "10.182.38.9:27017" }

Tue Aug 30 17:52:40 [rs Manager] replSet not electing self, not all members up and we have been up less than 5 minutes
Tue Aug 30 17:52:45 [initandlisten] connection accepted from 207.239.86.46:51353 #7
Tue Aug 30 17:52:46 [rs Manager] replSet not electing self, not all members up and we have been up less than 5 minutes
Tue Aug 30 17:52:52 [rs Manager] replSet not electing self, not all members up and we have been up less than 5 minutes
Tue Aug 30 17:52:58 [rs Manager] replSet not electing self, not all members up and we have been up less than 5 minutes
Tue Aug 30 17:53:02 [rs Manager] replSet not electing self, not all members up and we have been up less than 5 minutes
...
Tue Aug 30 17:55:28 [rs Manager] replSet not electing self, not all members up and we have been up less than 5 minutes
Tue Aug 30 17:55:32 [rs Manager] replSet not electing self, not all members up and we have been up less than 5 minutes
Tue Aug 30 17:55:34 [rs Manager] replSet not electing self, not all members up and we have been up less than 5 minutes
Tue Aug 30 17:55:40 [rs Manager] replSet not electing self, not all members up and we have been up less than 5 minutes
Tue Aug 30 17:55:46 [rs Manager] replSet info electSelf 1
Tue Aug 30 17:55:46 [rs Manager] replSet PRIMARY



 Comments   
Comment by Kristina Chodorow (Inactive) [ 30/Aug/11 ]

Phew! Great.

Comment by Tony Hannan [ 30/Aug/11 ]

Ah, you're right. With members already up for 5 minutes the failover is fast.

Comment by Kristina Chodorow (Inactive) [ 30/Aug/11 ]

Can you retry, waiting until the secondary has been up for at least 5 minutes before killing the primary? This doesn't look like an issue with the arbiter.

Generated at Thu Feb 08 03:03:52 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.