[SERVER-48603] Rollback via refetch can result in out of order timestamps Created: 05/Jun/20  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Replication, Storage
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Daniel Gottlieb (Inactive) Assignee: Backlog - Replication Team
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-48518 Rollback via refetch (EMRC = false) c... Closed
is related to SERVER-48453 Lazily initialize a record store's au... Closed
is related to WT-6388 Fix-up out-of-order updates in the hi... Closed
Assigned Teams:
Replication
Operating System: ALL
Participants:

 Description   

This is a hypothetical that I'm documenting for investigation. The idea was inspired by suganthi.mani's discovery in SERVER-48518. Unlike SERVER-48518, this out of order timestamp condition is only problematic when WT is running with durable history.

Consider the following sequence where a primary accepts some writes, rolls them back, and then as a secondary replicates writes that use the same key-space.

  • As primary
  • Insert {_id: 1, doc: "A"} @ TS 20 RecordId(5)
  • Delete {_id: 1, doc: "A"} @ TS 30
  • Stepdown, perform Rollback via refetch to back to TS 10
  • Rolls back delete – refetch {_id: 1} from sync source. The document does not exist – do nothing
  • Rolls back insert – the {_id: 1} document doesn't exist in the index – do nothing
  • Become secondary
  • Replicate Insert {_id: 1, doc: "B"} @ TS 15 RecordId(5) RecordId(6)

In this state, there are two update chains with out of order timestamps:
RecordStore: RecordId(5) V3(15) -> V2(30) -> V1(20) See comments
Index: KeyString(1) V3(15) -> V2(30) -> V1(20)

Note the out of order updates in the RecordStore case are not alleviated by SERVER-48453 as this problem would still exist without lazy-initialization.



 Comments   
Comment by Daniel Gottlieb (Inactive) [ 08/Jun/20 ]

I don't know if that's actually a thing. alexander.gorrod?

Comment by Judah Schvimer [ 08/Jun/20 ]

daniel.gottlieb, would it be possible to turn off durable history when using eMRC=false?

Comment by Daniel Gottlieb (Inactive) [ 05/Jun/20 ]

I misdiagnosed the problem on the record store. Because RollbackViaRefetch does not restart the catalog, the next id doesn't reset to an earlier value in this case. So the problem within a process lifetime is limited to the _id index. That can be solved with a MongoDB only fix of regenerating the _id index key and throwing an untimestamped delete on top of the update chain.

Comment by Daniel Gottlieb (Inactive) [ 05/Jun/20 ]

Brainstorming some solutions (including the bad ones). The common goal across all of them is either:
A) Prune all of the update chains in front of the common point.
B) Write a 0-timestamped tombstone on each WT document touched in the operations being rolled back. Note: only writing tombstones are of interest. If the correct post-state is a non-tombstone, rollback via refetch is guaranteed to bring the data into that state.

  • Disallow eMRC=false (effectively pruning, A)
    Uncertain if this will happen in 4.4
  • After RvR force back stable_timestamp to the common point (that's already done). Call rollback_to_stable (A)
    Does this work? Would this also fix SERVER-38925?
  • If RvRefetching the delete finds nothing - can read at TS(delete)-1 to find the record and restore it with a 0 timestamp write. (B)
    If TS(delete) > oldest_timestamp, this works?
    If TS(delete) < oldest_timestamp, no guarantee WT has wiped the history:
    Try after forcing back the oldest_timestamp to the common point? Only works if WT has kept the index entry. If the index entry was pruned, but not the record itself, the record will dangle.
  • If the above strategy when rolling back the delete doesn't work, RvRefetching the insert, generate the _id keystring, insert + delete it at TS(0) (B)
    Don't know the RecordId -> Doesn't work on the record store document, nor secondary indexes
  • (Eagerly) Crash (A)
    Works iff calling rollback_to_stable works?
  • Optimistically proceed with "unstable" update chains. Have WT WT_ROLLBACK/WriteConflictException on out of order timestamps. (A)
    Would work if the node became a primary. Would (lazily) crash a secondary.
Generated at Thu Feb 08 05:17:35 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.