Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 3.7.3
Affects Version/s: None
Component/s: Replication, Storage
Labels:
- rollback-functional

Backwards Compatibility:
Fully Compatible
Sprint:
Repl 2018-02-26, Repl 2018-03-12, Repl 2018-03-26
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Replication recovery starts replaying oplog based on the `checkpointTimestamp` document. However, the value of that document may be stale relative to the time of the actual checkpoint.

This leads to two problems:

Replication recovery has to relax constraints when possibly re-applying entries from an earlier time than the data currently represents. The system does not currently have a way to distinguish between "being applied for the first time" and "maybe being applied a subsequent time". Relaxing these constraints brings risk in breaking the typical execution path.
Reading from a timestamp between the stale checkpoint timestamp value and the actual data time can result in incorrect results. Furthermore, it's impossible to know the actual data time, so the last op applied at recovery is the earliest read time that can be correctly satisfied.

The work to be done for this ticket:

Have `recoverToStableTimestamp` return a `StatusWith<Timestamp>`. This is to be used by rollback to determine where to start replication recovery from. Amend KVStorageEngine/KVEngine/WiredTigerKVEngine to satisfy the API type.
Add a method to the storage engine interface that returns the "data timestamp" on disk. This is to be used at startup to determine where to start replication recovery from.
Have replication recovery query/use these timestamp values for determining where to start applying oplog entries from.

is depended on by

SERVER-30464 Edit startup warning when running replset member as standalone to mention that data may look inconsistent

Closed

SERVER-33348 Remove checkpoint timestamp collection

Closed

SERVER-33349 Add command to get stable checkpoint timestamp

Closed

is duplicated by

SERVER-32304 Amend ReplicationRecovery code to reflect changes to design

Closed

related to

SERVER-47844 Update _setStableTimestampForStorage to set the stable timestamp without using the stable optime candidates set when EMRC=true

Closed

Assignee:: Judah Schvimer
Reporter:: Daniel Gottlieb (Inactive)
Participants:: Daniel Gottlieb, Githook User, Judah Schvimer
Votes:: 0 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: Feb 13 2018 04:36:02 PM UTC
Updated:: Oct 29 2023 10:34:47 PM UTC
Resolved:: Mar 12 2018 07:38:06 PM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates