[SERVER-19244] Secondary position information can be erroneous if nodes leave and rejoin a cluster with less data than before Created: 01/Jul/15  Updated: 06/Dec/22  Resolved: 11/Apr/22

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Eric Milkie Assignee: Backlog - Replication Team
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Documented
is documented by DOCS-15225 Document risks of node rejoining a cl... Backlog
Related
related to SERVER-46085 Fail initial sync attempt if sync sou... Closed
is related to SERVER-17934 Do not report upstream progress while... Closed
is related to SERVER-18511 Report upstream progress when initial... Closed
Assigned Teams:
Replication
Operating System: ALL
Participants:

 Description   

For example, say a node with member id 2 in the config crashes. The data is restored from backup taken a day prior and restarted. Now the node begins reporting upstream for id 2 a position that is behind where id 2 had previously reported.
Today's position managing code does not change a node's position if it is reported to have moved backward.

This is a problem because nodes can thus appear to contain data that they do not actually have.



 Comments   
Comment by Eric Milkie [ 04/Jan/20 ]

I'm pretty sure this is still the case. We can't move reported nodes' positions backwards because it's possible for updatePosition messages from the same node to arrive out of order at a primary, if the messages take different paths through the spanning tree and the spanning tree changes during propagation of the messages.

Comment by Judah Schvimer [ 03/Jan/20 ]

I'm not sure if this is still the case. This would also be a problem for a node that resyncs from a stale node. We will need to address this as part of initial sync semantics (PM-1096). Marking as blocked on that.

Generated at Thu Feb 08 03:50:18 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.