[SERVER-7275] node can't roll back if behind minValid Created: 05/Oct/12 Updated: 06/Dec/22 Resolved: 08/Sep/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Dwight Merriman | Assignee: | Backlog - Replication Team |
| Resolution: | Duplicate | Votes: | 2 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Assigned Teams: |
Replication
|
||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
Consider a replica set with three members at this state of oplogs: S1: 1 2 3 4 5 6 7 8 9 (primary) now suppose S2 starts applying the batch 2..9. It early commits {2,4,6,8} ops. It then crashes.after crashing, the oplog for S2 is unchanged, but writes have occurred to the datafiles for opids {2,4,6,8} . On restart, S2 would recover ok (if it's idempotent) if S1 is up. However, suppose S1 goes down first (perhaps permanently). Now S2 and S3 are the remaining set members on S2's restart. S3 has the latest data. After recovery we have: S1: down S3: 1 2 3 4 5 *However S2 has also written ops {6,8} and they are never rolled back. |
| Comments |
| Comment by Eric Milkie [ 08/Sep/16 ] |
|
The situation described here is fixed by the work in |
| Comment by Scott Hernandez (Inactive) [ 21/Apr/16 ] |
|
Since it is now possible that the oplog has been written locally, it might be possible to finish applying to get consistent and then to do the rollback after. This would only be possible if the oplog entries were recorded and are available on restart – which is now possible since we write the local oplog entries and apply them concurrently. This would be the same behavior as the oplog as a buffer, but that change promises that the oplog will be there, instead of just being possibly there depending on execution and journaling order. |
| Comment by Eric Milkie [ 23/Feb/15 ] |
|
Actually in the scenario above, S2 would go down, since it would attempt rollback but its minValid was not pointing at the end of its oplog. |
| Comment by Dwight Merriman [ 05/Oct/12 ] |
|
Three possible solutions: 2b has an advantage over 2a in that it doesn't change the oplog format. Thus it is backward compatible and in addition won't mess up anyone who queries the oplog themselves for custom purposes. It's kind of like a journal for the current batch. |