[SERVER-39489] Clarify optime creation in TransactionParticipant Created: 11/Feb/19  Updated: 29/Oct/23  Resolved: 23/May/19

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.1.12

Type: Task Priority: Major - P3
Reporter: Judah Schvimer Assignee: Judah Schvimer
Resolution: Fixed Votes: 0
Labels: prepare_durability
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-41165 Transaction participants don't need t... Backlog
Backwards Compatibility: Fully Compatible
Sprint: Repl 2019-05-20, Repl 2019-06-03
Participants:

 Description   

There is a line here that is no longer 100% true:

// Transactions do not survive term changes, so combining "getTerm" here with the
// recovery unit timestamp does not cause races.
_speculativeTransactionReadOpTime = {readTimestamp, replCoord->getTerm()};

This is no longer true for prepared transactions since they can survive term changes. I expect this is still safe since prepared transactions will do a write and not wait on the _speculativeTransactionReadOpTime but this should be double checked, and commented as such.

"OpTime creation" in the transaction participant is done here as well, very similarly.



 Comments   
Comment by Githook User [ 23/May/19 ]

Author:

{'name': 'Judah Schvimer', 'email': 'judah@mongodb.com', 'username': 'judahschvimer'}

Message: SERVER-39489 Remove optime creation in TransactionParticipant
Branch: master
https://github.com/mongodb/mongo/commit/e0f66bf6d4aefe9e104453716aff5f47a42931e4

Comment by Esha Maharishi (Inactive) [ 15/May/19 ]

tess.avitabile, I didn't read all the above comments, but we are not planning to do that optimization before 4.2. I created a ticket for it (SERVER-41165) and marked it as related to this ticket so it's easier to follow this paper trail if or when we do do it.

Comment by Judah Schvimer [ 15/May/19 ]

Given that it's safe as matthew.russotto explained, I'll just improve the comment and maybe clean up the code a bit to keep the optimization easy.

Comment by Tess Avitabile (Inactive) [ 15/May/19 ]

judah.schvimer, I agree that we can remove speculativeTransactionReadOptime completely, since we do a noop write for all read-only transactions, including transactions with readConcern level snapshot. My only concern is whether esha.maharishi planned to make an optimization to remove the noop write for read-only transactions with readConcern level snapshot before 4.2.

Comment by Judah Schvimer [ 15/May/19 ]

matthew.russotto, Thanks! I agree. That said, do you agree that this is dead code now that we write noops for read-only transactions?

Comment by Matthew Russotto [ 15/May/19 ]

That particular comment is only saying that the _speculativeTransactionReadOpTime is coherent; that is, that the term didn't change between the time we determined the read timestamp (with preallocateSnapshot() ) and the time we called getTerm().  At the time the fact that the term didn't change during a transaction was the only guarantee, but we have a stronger guarantee of that now: This method is called with the global and RSTL lock held, so term cannot change.

Comment by Judah Schvimer [ 15/May/19 ]

OpTime creation first began in 4.0 in SERVER-34038 for the speculativeTransactionReadOpTime. It was added to wait for write concern on the read timestamp of a read-only transaction (for read concerns majority and snapshot). This made sure that no data read would be rolled back in speculative majority and speculative snapshot transactions that committed with w:majority. If the transaction did a write it would wait on the timestamp of that write as part of waiting for write concern, but if the transaction was read-only it would only wait for write concern on the previous OpTime on the client, which could be stale or null.

In 4.0, the comment associated with the OpTime creation was true “Transactions do not survive term changes, so combining "getTerm" here with the recovery unit timestamp does not cause races.”. This is no longer true due to prepared transactions surviving term changes.

The next evolution in 4.0 was in SERVER-34880 to make read concern local transactions also wait for write concern on read-only transactions. This was just to make local and majority read concern transactions behave identically since the default read concern in 4.0 was local even though we wanted transactions to default to speculative majority.

At the beginning of 4.2, we found a bug fixed in SERVER-35821. Read concern snapshot transactions were reading ahead of the all_committed timestamp. This introduced the first divergence between read concern local/majority and snapshot transactions. Snapshot transactions now read at the all_committed timestamp and local and majority transactions read at lastApplied.

This code evolved in 4.2 with SERVER-38850. Somewhat unrelatedly, we began performing noop writes when returning NoSuchTransaction errors to make sure TransientTransactionError labels on retried commitTransaction commands ensure a transaction definitively did not commit.

These noop writes became related, however, in SERVER-38906. The goal of SERVER-38906 was to not do timestamped reads ahead of the all_committed timestamp. This built on top of SERVER-35821. Rather than local/majority reading at lastApplied they began reading with no timestamp. This begged the question, how should read-only local/majority read concern transactions wait for the data they read to be majority committed. (There is an argument to be made that local read concern transactions need not wait for the data they read to be majority committed, and do not need to advance their write concern if they do no writes. SERVER-38906 decided to maintain current behavior that read concern local and majority transactions in a single replica set have identical behavior.) Calling setLastOpToSystemLastOpTime was not an option due to SERVER-39364, so we chose to do a noop write instead and wait for write concern on the timestamp of that noop write. At this point, we began only setting a speculativeTransactionReadOpTime for read concern snapshot transactions. Read concern local and majority transactions used the noop writes’ OpTimes to wait for the data read in read-only transactions to be majority committed. The noop write, however, was not limited to local and majority transactions. Since read-only transactions always do a noop write, the transaction will always either wait for write concern on the transaction’s noop write for read-only transactions, or a write in the transaction if it wasn’t read-only. They will never wait on the speculativeTransactionReadOpTime.

As a result, it should be safe to remove the speculativeTransactionReadOptime completely. We currently use it for logging, but the read timestamp goes in the SingleTransactionStats as well, which is the proper place for that information. The testing added in SERVER-38906 and speculative_read_transaction.js already tests the behavior we want.

Comment by Gregory McKeon (Inactive) [ 25/Mar/19 ]

judah.schvimer can we get a prepare label on this?

Generated at Thu Feb 08 04:52:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.