Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 8.1.0-rc0, 8.0.10
Affects Version/s: None
Component/s: None
Labels:
None

Assigned Teams:

Replication
Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v8.0
Sprint:
Repl 2025-01-20, Repl 2025-02-03, Repl 2025-02-17
Linked BF Score:
200
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

In BF-35520 there was a case where a node went into rollback. The node had these oplog entries (part of a transactionally replicated vectored inserts)
1. timestamp: (1730013230, 1), prevOpTime (0, 0)
2. timestamp: (1730013230, 2), prevOpTime (1730013230, 1)
3. timestamp (1730013230, 3), prevOptime (1730013230, 2)

The node is rolling back to the stable timestamp which is (1730013230, 1). In _restoreTxnsTableEntryFromRetryableWrites we look through the oplog (before truncation) with this filter, trying to find retryable write entries (entries that have txnNumber and stmtId as top-level fields) that have a timestamp after the stable timestamp but with a prevOpTime <= the stable timestamp, and if so, restoring the txn table entry based off that info.

We should expect entry 2 to match the filter and restore the txn table entry, but after SPM-3381, the oplog entry format was changed so that inserts are batched within an applyOps entry, so the stmtId field is now nested within the applyOps. Therefore, none of the oplog entries match the filter, and we skip the step to restore the transactions table.
This results in a data inconsistency where one of the nodes does not have the correct config.transactions doc.

This requires that a secondary (not the primary) go into rollback, in order that when the secondary uses WT recover to stable timestamp, the config.transactions table is not correct, and then we skip the step to restore the transactions table, resulting in the node not having the correct config.transactions table at the end of rollback.

is related to

SERVER-55305 Retryable write may execute more than once if primary had transitioned through rollback to stable

Closed

Assignee:: Moustafa Maher
Reporter:: Huayu Ouyang
Participants:: Githook User, Huayu Ouyang, Moustafa Maher
Votes:: 0 Vote for this issue
Watchers:: 9 Start watching this issue

Created:: Jan 09 2025 05:19:54 PM UTC
Updated:: May 19 2025 06:01:21 PM UTC
Resolved:: Feb 06 2025 08:20:13 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates

PagerDuty