[SERVER-51158] Must not truncate entire oplog before truncate point Created: 25/Sep/20  Updated: 29/Oct/23  Resolved: 12/Nov/20

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.9.0-alpha0

Type: Bug Priority: Major - P3
Reporter: Matthew Russotto Assignee: Matthew Russotto
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File node_truncates_entire_stable_oplog.js    
Issue Links:
Depends
Problem/Incident
causes SERVER-56590 Oplog truncation should correctly han... Closed
Related
related to SERVER-51049 Cannot assume recovery timestamp can ... Closed
related to SERVER-54666 Use earlier oplog entry if recovery t... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Repl 2020-10-19, Repl 2020-11-02, Repl 2020-11-16
Participants:
Linked BF Score: 28

 Description   

As a result of the Remove Stable Optime Candidates List (PM-1713) project, it is possible to have a case where there are no oplog entries before the oplog truncate after point (computer from the all-durable timestamp, which is at or after the stable optime candidate).  This happens when an oplog hole is open long enough that the size of the oplog entries after the hole is bigger than the configured oplog size, and so all entries prior to the stable timestamp get truncated

We cannot handle this case; we need an oplog entry before the truncate point to know when to start fetching. So we must ensure during truncation that we leave a record at or before the truncate point.
 



 Comments   
Comment by Matthew Russotto [ 19/Feb/21 ]

nb: There are two "truncate points", the replication truncate-after point and the storage mayTruncateUpTo point, which is limited by a number of things, including that it must be no later than the stable timestamp. The mayTruncateUpTo point is always earlier than the truncate-after point. The CR ensures there is always at least one oplog entry less than or equal to the mayTruncateUpTo point.

Comment by Githook User [ 12/Nov/20 ]

Author:

{'name': 'Matthew Russotto', 'email': 'matthew.russotto@mongodb.com', 'username': 'mtrussotto'}

Message: SERVER-51158 Must not truncate entire oplog before truncate point.
Branch: master
https://github.com/mongodb/mongo/commit/1bdd251a274205c3af283696f443643894c209e3

Comment by Matthew Russotto [ 12/Nov/20 ]

I have attached a reproducer for the record; I believe it is far too fragile to include in the test suites.

Comment by Matthew Russotto [ 04/Nov/20 ]

Finally managed to reproduce the bug. While it can happen even with EMRC true, it requires writeConcernMajorityJournalDefault: false also, on a single-voting-node replica set. If the replica set has multiple voting nodes the majority cannot advance past the last real oplog entry and thus this can't happen. If writeConcernMajorityJournalDefault is true, the majority write concern cannot advance until we're past the hole.

Generated at Thu Feb 08 05:24:40 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.