[SERVER-26253] Get new sync source rather than rollback on RemoteOplogStale error Created: 22/Sep/16 Updated: 29/Dec/16 Resolved: 29/Dec/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Judah Schvimer | Assignee: | Judah Schvimer |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Sprint: | Repl 2017-01-23 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Linked BF Score: | 15 | ||||||||||||||||
| Description |
|
A RemoteOplogStale error means that the OplogFetcher got an empty batch for its remote oplog query with a predicate that is greater than or equal to its own most recent oplog entry. This means that its sync source's most recent oplog entry is older than it's own most recent oplog entry. This should lead to getting a new sync source rather than a rollback. |
| Comments |
| Comment by Spencer Brody (Inactive) [ 29/Dec/16 ] |
|
We have to be careful here. If we never roll back on RemoteOplogStale, we could wind up spinning looking for a sync source and not rolling back when we need to. We should probably figure out a unified solution for this and |
| Comment by Eric Milkie [ 23/Sep/16 ] |
|
Scott, you just described the exact situation that occurs when the resync command runs. So switching sync sources would be the correct thing to do in that case. You would never want to roll back in such a situation, as you can always wait until some node gets ahead of you again (and then decide for real if you should roll back). |
| Comment by Scott Hernandez (Inactive) [ 22/Sep/16 ] |
|
The sync source choosing code only returns a member to sync from that is ahead, so this behavior is correct unless the upstream node goes back in time/oplog-stream. |