[SERVER-19244] Secondary position information can be erroneous if nodes leave and rejoin a cluster with less data than before Created: 01/Jul/15 Updated: 06/Dec/22 Resolved: 11/Apr/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Eric Milkie | Assignee: | Backlog - Replication Team |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Assigned Teams: |
Replication
|
||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||
| Description |
|
For example, say a node with member id 2 in the config crashes. The data is restored from backup taken a day prior and restarted. Now the node begins reporting upstream for id 2 a position that is behind where id 2 had previously reported. This is a problem because nodes can thus appear to contain data that they do not actually have. |
| Comments |
| Comment by Eric Milkie [ 04/Jan/20 ] |
|
I'm pretty sure this is still the case. We can't move reported nodes' positions backwards because it's possible for updatePosition messages from the same node to arrive out of order at a primary, if the messages take different paths through the spanning tree and the spanning tree changes during propagation of the messages. |
| Comment by Judah Schvimer [ 03/Jan/20 ] |
|
I'm not sure if this is still the case. This would also be a problem for a node that resyncs from a stale node. We will need to address this as part of initial sync semantics (PM-1096). Marking as blocked on that. |