[SERVER-51049] Cannot assume recovery timestamp can be found in oplog Created: 18/Sep/20  Updated: 29/Oct/23  Resolved: 07/Dec/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 4.9.0

Type: Bug Priority: Major - P3
Reporter: Matthew Russotto Assignee: Matthew Russotto
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
related to SERVER-54666 Use earlier oplog entry if recovery t... Closed
is related to SERVER-51158 Must not truncate entire oplog before... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Repl 2020-10-19, Repl 2020-11-02, Repl 2020-11-16, Repl 2020-11-30, Repl 2020-12-14
Participants:
Linked BF Score: 15

 Description   

During recovery, we assume the oplogApplicationStartPoint (which derives from the stable timestamp)
https://github.com/mongodb/mongo/blob/2c446f3f587f406c23cdfca87f227ee5cd466fa8/src/mongo/db/repl/replication_recovery.cpp#L165

After the PM-1713
Remove Stable Optime Candidates List project, this is no longer the case. If the oplogApplicationStartPoint is not present in the oplog, that's OK. In this case we must apply every entry in the oplog query (i.e. we do NOT skip the first entry)

It is also the case that the oplog query may return no oplog entries, so the check here

https://github.com/mongodb/mongo/blob/2c446f3f587f406c23cdfca87f227ee5cd466fa8/src/mongo/db/repl/replication_recovery.cpp#L149

is also incorrect and should be replaced with a log message.



 Comments   
Comment by Githook User [ 03/Dec/20 ]

Author:

{'name': 'Matthew Russotto', 'email': 'matthew.russotto@mongodb.com', 'username': 'mtrussotto'}

Message: SERVER-51049 Cannot assume recovery timestamp can be found in oplog
Branch: master
https://github.com/mongodb/mongo/commit/0bd12794bfe9a109747e17f6a022664a002c47de

Comment by Matthew Russotto [ 12/Nov/20 ]

This is still going to fail after SERVER-51158 because while we are assured there is an oplog entry at or before the stable timestamp, the code assumes the stable timestamp itself has an oplog entry.

https://github.com/mongodb/mongo/blob/c575750f73b7a490a60919777dc49c45ec4f2e0c/src/mongo/db/repl/replication_recovery.cpp#L481

The correct oplogApplicationTimestamp is the timestamp of the first oplog entry at or before the stable timestamp, which is guaranteed to exist by SERVER-51158, so we can fix it with one more oplog lookup to find that entry.

Comment by Matthew Russotto [ 10/Nov/20 ]

After SERVER-51158 we know there is an oplog entry before the stable timestamp, but I need to determine whether or not the oplogApplicationStartPoint is definitely there.

Comment by Daniel Gottlieb (Inactive) [ 24/Sep/20 ]

I don't have enough information to take a guess as to why we haven't seen this yet. I'm on board with that we can select a stable timestamp with an "i: 0". But I'd need a more detailed explanation of how we got there. Backing out from the observation of an "i: 0" stable timestamp:

  • I assume a (secondary) node heard of a commit point of "i: 1"
  • And I assume its lastApplied was also "i: 1"
  • But an open WT transaction had a (durable) timestamp of "i: 1". I don't think this is commonplace, but I believe code paths, such as committing a prepared transaction can get us into this state.
  • We hit this code which will hold back the stable timestamp. Even though the behavior is only required for single node replica sets, I can see this having an effect on multi-node replica sets.
Comment by Matthew Russotto [ 24/Sep/20 ]

It only happens when we get a crash and the stable timestamp happens to not be in the oplog.  I'm not sure how often we should expect that to happen.

 

Comment by Judah Schvimer [ 23/Sep/20 ]

matthew.russotto, this makes sense to me. I don't think this is limited to EMRC=F (which is where the BF occurred), I'm curious though why we haven't seen more occurrences of this. Any ideas? CC daniel.gottlieb

Generated at Thu Feb 08 05:24:22 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.