[SERVER-20177] incoming heartbeats should update liveness table Created: 28/Aug/15  Updated: 25/Jan/17  Resolved: 03/Sep/15

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 3.1.8

Type: Bug Priority: Major - P3
Reporter: Eric Milkie Assignee: Siyuan Zhou
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: RPL 9 (09/18/15)
Participants:

 Description   

Because the primary does not have knowledge of which secondaries are currently connected via the spanning tree, it can not tell when secondaries are down, unless we depend on heartbeats.
We are currently increasing the heartbeat rate for secondaries with no sync source, but such heartbeats are ineffective for updating a primary's liveness table. The work for this ticket should be to make such incoming heartbeats update the liveness table just like outgoing heartbeats. We may need to include more information in the outgoing request so that the receiver has enough data.



 Comments   
Comment by Githook User [ 03/Sep/15 ]

Author:

{u'username': u'visualzhou', u'name': u'Siyuan Zhou', u'email': u'siyuan.zhou@mongodb.com'}

Message: SERVER-20177 Incoming heartbeats should update liveness table in PV1.
Branch: master
https://github.com/mongodb/mongo/commit/8331b0e0bcc57821d36b790866a1a692cee2408e

Comment by Githook User [ 02/Sep/15 ]

Author:

{u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-20177 update liveness table from incoming heartbeats

This is the same as updating the table based on incoming updatePosition commands.
Branch: master
https://github.com/mongodb/mongo/commit/6cb1c6dc32da19dff41f40eec6beea9abc6bbdd4

Comment by Eric Milkie [ 29/Aug/15 ]

We did! But I think I have a way to avoid such issues this time.

Comment by Andy Schwerin [ 29/Aug/15 ]

Didn't we remove similar behavior during the development of 3.0, because it prevented failover during asymmetric network partitions?

Generated at Thu Feb 08 03:53:24 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.