-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: RTS, Transactions
-
None
-
Storage Engines
-
5
-
2024-07-23 - Mining crypto
-
v8.0, v7.0, v6.0, v5.0
In the BF, we noticed that txn IDs in the update chain are preserved after node restart, causing false positive write conflict errors and eventually leading to evergreen timeout.
Consider the following scenario
1) Insert {k1:v1} happens at TS(100) with txnid = 9000.
2) Remove {k1:v1} happens at TS(300) with txnid = 9001.
3) An unclean shutdown happens with the lastCheckpointTS at TS(200).
4) Node restarts with a recoveryTS (stable ts) as TS(200).
5) Startup recovery oplog replay phase re-applies the remove op (step #2) but fails with a WT_ROLLBACK error.
- Say, the remove operation's snapshot _min_txn_id = snapshot _max_txn_id=10, but the earlier update (step #1 insert op) has a txn id of 9000, causing it to fail the txn id visibility check.
The expected behavior in the above example would be that the txn ID of the earlier update (step #1 insert op) after the crash would be reset to WT_TXN_NONE(0) by RTS. As a result, both the txn ID visibility check and timestamp check would pass, allowing the remove operation to succeed without a write conflict error.