[SERVER-30873] Investigate where the oplog truncation happens during replication recovery Created: 29/Aug/17  Updated: 27/Oct/23  Resolved: 12/Sep/17

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Spencer Brody (Inactive) Assignee: Judah Schvimer
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Operating System: ALL
Backport Requested:
v3.4
Sprint: Repl 2017-10-02
Participants:

 Description   

The comments for ReplicationRecovery::_truncateOplogTo indicate that it should truncate entries after and including 'truncateTimestamp'.

The implementation, however, finds the first oplog entry less than the 'truncateTimestamp' then tells the storage engine to truncate the oplog starting at that entry, and passes true for the 'inclusive' flag. This means it will actually truncate the entry one before the 'truncateTimestamp' passed it.

We need to audit exactly how and when the oplogTruncateAfterPoint is set and make sure we're truncating at the right points.



 Comments   
Comment by Daniel Gottlieb (Inactive) [ 12/Sep/17 ]

That makes sense. What I was doing with integrating rollback via recovery was observing some problem in this area, but perhaps I misdiagnosed it. I'm with Eric; let's close this for now and I'm sure if there's a real problem it will show up again when the project is revisited.

Comment by Eric Milkie [ 12/Sep/17 ]

There might still be some bug here, as I believe the problem involves the intersection of the stable timestamp and the truncation point. The unit test is just hard coding some value for the truncate-after-point, so while the logic is confirmed correct for truncation point, there may still be a problem at a higher layer. I think we can resolve this particular ticket, though, and reopen a new ticket after we experience the problem again.

Comment by Spencer Brody (Inactive) [ 12/Sep/17 ]

Makes sense to me. This came up because of an issue daniel.gottlieb was having in his testing. Dan it seems like there's no bug here after all. Do your findings disagree with that?

FYI milkie

Comment by Judah Schvimer [ 12/Sep/17 ]

I believe this is actually fine and the code is doing what it's supposed to be doing as evidenced by this unittest: https://github.com/mongodb/mongo/blob/bddfee0d34513631424363645139f8eb4acbbc4c/src/mongo/db/repl/replication_recovery_test.cpp#L263-L279

The code is a bit confusing, but we truncate the oplog at `oldestIDToDelete`, not at the current RecordID we're looking at. We search backwards through the oplog, checking if we're past the `truncateTimestamp` at each point. If we are not past the `truncateTimestamp` (greater than or equal to it), then we set `oldestIDToDelete` equal to the current oplog entry we are looking at. If we are past the `truncateTimestamp`, we do NOT set the `oldestIDToDelete` again, and pass the previous `oldestIDToDelete` into `cappedTruncateAfter`, which must be greater than or equal to `truncateTimestamp`.

Example: We have an oplog [1,2,3] and we want to truncate at 2 yielding [1] afterwards. We begin looking at 3. 3 is not less than 2, so we set `oldestIDToDelete` to 3. We then look at 2. 2 is not less than 2 so we set `oldestIDToDelete` to 2. We then look at 1. 1 is less than 2, so we truncate at `oldestIDToDelete`(inclusive) which is 2. This leads to our desired oplog of [1].

spencer, do you agree with this reading and does the unittest suffice?

Comment by Spencer Brody (Inactive) [ 29/Aug/17 ]

This also appears to be an issue on older versions as well.

Generated at Thu Feb 08 04:25:18 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.