[SERVER-51049] Cannot assume recovery timestamp can be found in oplog Created: 18/Sep/20 Updated: 29/Oct/23 Resolved: 07/Dec/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 4.9.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Matthew Russotto | Assignee: | Matthew Russotto |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Sprint: | Repl 2020-10-19, Repl 2020-11-02, Repl 2020-11-16, Repl 2020-11-30, Repl 2020-12-14 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Linked BF Score: | 15 | ||||||||||||||||
| Description |
|
During recovery, we assume the oplogApplicationStartPoint (which derives from the stable timestamp) After the PM-1713 It is also the case that the oplog query may return no oplog entries, so the check here is also incorrect and should be replaced with a log message. |
| Comments |
| Comment by Githook User [ 03/Dec/20 ] |
|
Author: {'name': 'Matthew Russotto', 'email': 'matthew.russotto@mongodb.com', 'username': 'mtrussotto'}Message: |
| Comment by Matthew Russotto [ 12/Nov/20 ] |
|
This is still going to fail after The correct oplogApplicationTimestamp is the timestamp of the first oplog entry at or before the stable timestamp, which is guaranteed to exist by |
| Comment by Matthew Russotto [ 10/Nov/20 ] |
|
After |
| Comment by Daniel Gottlieb (Inactive) [ 24/Sep/20 ] |
|
I don't have enough information to take a guess as to why we haven't seen this yet. I'm on board with that we can select a stable timestamp with an "i: 0". But I'd need a more detailed explanation of how we got there. Backing out from the observation of an "i: 0" stable timestamp:
|
| Comment by Matthew Russotto [ 24/Sep/20 ] |
|
It only happens when we get a crash and the stable timestamp happens to not be in the oplog. I'm not sure how often we should expect that to happen.
|
| Comment by Judah Schvimer [ 23/Sep/20 ] |
|
matthew.russotto, this makes sense to me. I don't think this is limited to EMRC=F (which is where the BF occurred), I'm curious though why we haven't seen more occurrences of this. Any ideas? CC daniel.gottlieb |