[SERVER-1929] handle replica set flapping Created: 12/Oct/10  Updated: 12/Jul/16  Resolved: 09/Oct/12

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 2.3.0

Type: Improvement Priority: Major - P3
Reporter: Dwight Merriman Assignee: Kristina Chodorow (Inactive)
Resolution: Done Votes: 2
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
is related to SERVER-6037 RS members should report on heartbeat... Closed
Participants:

 Description   

If tcp connections quickly flap, we should not fail over to secondary. We should make sure this doesn't happen at least in the trivial case where a single reconnect try works just fine after a socket exception.

please research; maybe we can have a test for this too - we could add an option to replSetTest command to close all connections - something like

MessagingPort::closeAllSockets(0);

might work for testing.



 Comments   
Comment by auto [ 12/Oct/12 ]

Author:

{u'date': u'2012-10-12T10:54:33-07:00', u'email': u'kristina@10gen.com', u'name': u'Kristina'}

Message: SERVER-1929 Fix test race condition
Branch: master
https://github.com/mongodb/mongo/commit/fd5cef9ff6475b67aadf95d25dec45875a670752

Comment by auto [ 11/Oct/12 ]

Author:

{u'date': u'2012-10-11T12:14:18-07:00', u'email': u'kristina@10gen.com', u'name': u'Kristina'}

Message: SERVER-1929 Replace literal with a constant
Branch: master
https://github.com/mongodb/mongo/commit/6453e95ec398eb40623c4a63c7f15b479822f153

Comment by auto [ 11/Oct/12 ]

Author:

{u'date': u'2012-10-11T09:04:28-07:00', u'email': u'kristina@10gen.com', u'name': u'Kristina'}

Message: SERVER-1929 Prevent outgoing heartbeat ports from closing on stepdown
Branch: master
https://github.com/mongodb/mongo/commit/9a6aa4f7e181b18fc3ed5868773930904629a3b4

Comment by auto [ 04/Oct/12 ]

Author:

{u'date': u'2012-10-04T15:56:46-07:00', u'email': u'kristina@10gen.com', u'name': u'Kristina'}

Message: SERVER-1929 Fix test to handle a stepdown
Branch: master
https://github.com/mongodb/mongo/commit/efbca4cd123faaa1c9b12d82612a95e525c1bd98

Comment by auto [ 04/Oct/12 ]

Author:

{u'date': u'2012-10-04T12:18:29-07:00', u'email': u'kristina@10gen.com', u'name': u'Kristina'}

Message: SERVER-1929 Remove unused heartbeat options from stepdown logic

Fixed test because stepdown is so much faster that the connection is dead
by the time ismaster is called.
Branch: master
https://github.com/mongodb/mongo/commit/07a6fd4726a8e876266319cd8d22d64111cf8688

Comment by auto [ 04/Oct/12 ]

Author:

{u'date': u'2012-10-04T08:59:31-07:00', u'email': u'kristina@10gen.com', u'name': u'Kristina'}

Message: SERVER-1929 Set return value on heartbeat retry
Branch: master
https://github.com/mongodb/mongo/commit/8ff934fbea3d470c5e499480abc6e962a1690a38

Comment by auto [ 21/Sep/12 ]

Author:

{u'date': u'2012-09-21T09:24:03-07:00', u'email': u'kristina@10gen.com', u'name': u'Kristina'}

Message: Add heartbeat timeout setting SERVER-1929
Branch: master
https://github.com/mongodb/mongo/commit/ac1521cf43903313bd4d4d36c95b4e5c966ac32c

Comment by auto [ 14/Sep/12 ]

Author:

{u'date': u'2012-09-14T11:48:26-07:00', u'email': u'kristina@10gen.com', u'name': u'Kristina'}

Message: Track heartbeats received for health SERVER-1929
Branch: master
https://github.com/mongodb/mongo/commit/b35e7705df9c090fa86db8a2c1ca52437b9aeaf1

Comment by auto [ 05/Sep/12 ]

Author:

{u'date': u'2012-09-05T13:58:34-07:00', u'email': u'kristina@10gen.com', u'name': u'Kristina'}

Message: Check pointer before dereferencing SERVER-1929
Branch: master
https://github.com/mongodb/mongo/commit/d0bffe83d5c9c4d15347e3ec2a61a83550e770cc

Comment by auto [ 05/Sep/12 ]

Author:

{u'date': u'2012-09-05T12:18:18-07:00', u'email': u'kristina@10gen.com', u'name': u'Kristina'}

Message: Allow socket timeout to be set after connecting SERVER-1929
Branch: master
https://github.com/mongodb/mongo/commit/1e7734b6adb86a0bcdde97208d102ed51cb08f7f

Comment by Richard Kreuter (Inactive) [ 16/Jul/12 ]

Kristina tells me this is the issue she considers canonical for changing stuff about RS heartbeats and things.

Making RS failovers not happen unless necessary ought to be a major issue.

Generated at Thu Feb 08 02:58:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.