[SERVER-48381] Allow syncing from a node with the same optime if it doesn't introduce a cycle Created: 21/May/20  Updated: 24/Jun/20  Resolved: 24/Jun/20

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Samyukta Lanka Assignee: Xuerui Fa
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Screen Shot 2020-05-19 at 12.14.19 PM.png    
Issue Links:
Related
Sprint: Repl 2020-06-15, Repl 2020-06-29
Participants:

 Description   

When re-evaluating its sync source, a node will check that there is a closer node that is ahead of it (it will use heartbeat data to determine if the other node is ahead). There could be certain configurations where nodes are in the same data center (meaning very close to each other), but none of the nodes are able to choose the others as a sync source because they are at similar optimes and stale heartbeats prevent nodes from thinking they are behind others.

For example, consider the attached image. If A and B are both syncing from the primary, they likely have similar network latency. If B is deciding if it should switch to A, it needs to know that A is ahead of it using heartbeat information. Since heartbeats could be stale and only happen every 2 seconds, it's possible that B wouldn't think that A was ahead of it for a long time, preventing having only one link between data centers.

One possible way to solve this is to relax the constraint that a node must be ahead of the syncing node to be considered a valid sync source. Ideally we could make sure that we only do this when it wouldn't cause sync source cycles. If that's not possible, one option is to implement a distributed cycle detection algorithm.



 Comments   
Comment by Samyukta Lanka [ 24/Jun/20 ]

Closing as won't fix because we will prioritize this work separate from this project.

Comment by Judah Schvimer [ 21/May/20 ]

One idea here is to use the memberId as the tiebreaker. It's mostly static so seems plausible, but we'd need to use TLA+ to confirm cycles aren't possible, even in the face of reconfigs.

Generated at Thu Feb 08 05:16:57 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.