[SERVER-36493] Invalidate in-memory prepared transaction state on replication rollback Created: 07/Aug/18  Updated: 29/Oct/23  Resolved: 11/Dec/18

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.1.7

Type: Task Priority: Major - P3
Reporter: Judah Schvimer Assignee: Pavithra Vetriselvan
Resolution: Fixed Votes: 0
Labels: prepare_durability
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-37235 invalidateSessions calls inside the O... Closed
is related to SERVER-37811 Replication rollback invalidates all ... Backlog
Backwards Compatibility: Fully Compatible
Sprint: Repl 2018-10-08, Repl 2018-10-22, Repl 2018-11-05, Repl 2018-11-19, Repl 2018-12-03, Repl 2018-12-17
Participants:

 Description   

Before calling recoverToStableTimestamp, we must abort any prepared transactions. This will require us to iterate through all active sessions and check if there is a prepared transaction on that session. If there is, we must abort the storage transaction. We will not update ServerTransactionMetrics or make any writes to the transactions table.

There are several scenarios that we have to consider, but in each it should be safe to abort the prepared transaction and reconstruct the state during startup recovery.

For the following examples, prepare timestamp = the oldest prepare timestamp whose corresponding commit/abort oplog entry is not majority committed.

Case #1: stable timestamp is before the prepare timestamp

When we rollback to a stable timestamp that is before the prepare timestamp, we can safely clear the prepared transaction states because we will reconstruct the prepared transaction and its commit/abort if one exists. We are not writing to the transactions table and will call the corresponding functions in TransactionParticipant (prepareTransaction, commitPreparedTransaction, and abortActiveTransaction) during startup recovery. Therefore, the oldestActiveOplogEntryOpTimes and oldestNonMajorityCommittedOpTimes will be be re-populated accordingly.

Case #2: stable timestamp is at the prepare timestamp

Since we will be reconstructing prepared transactions even if the stable timestamp is AT the prepare timestamp, this case is the same as Case #1.

Case #3: stable timestamp is after the prepare timestamp.

The stable timestamp would only advance farther than the prepare timestamp if its corresponding commit/abort oplog entry has been majority committed. This would mean that neither the prepare or commit/abort should be rolled back since the common point would have to be after the majority commit point. In this case, we technically should no longer have information about the prepared transaction, so its safe to clear the data structure. If another prepared transaction exists, we would defer to Case #1 or Case #2.

We already have a function that invalidates sessions in session_catalog_mongod that is called in OpObserverImpl::onReplicationRollback. This is used to invalidate sessions that had operations that would have been rolled back. Similarly we would create a function called invalidateSessionsWithPreparedTransactions that would scan all sessions and call txnParticipant->shutdown() on prepared transactions.

Currently, the shutdown() function is only used on shutdown and aborts the storage transaction of any transaction. We will modify this function to have a flag isInRollback so we know to only abort prepared transactions if this is set to true. We will also clear the oldestActiveOplogEntryOpTimes and oldestNonMajorityCommittedOpTimes in this function.

Finally, we will have to decide how to properly update ServerTransactionsMetrics during a rollback. For example, if we call abortActiveTransaction on a prepared transaction, it will increment the count of both totalAborted and totalPreparedThenAborted.

Since we cannot write an integration test for this ticket until the state transition work is in, we will unit test the shutdown and invalidateSessionsWithPreparedTransactions functions.



 Comments   
Comment by Githook User [ 11/Dec/18 ]

Author:

{'name': 'Pavi Vetriselvan', 'email': 'pvselvan@umich.edu', 'username': 'pvselvan'}

Message: SERVER-36493 invalidate in-memory state of prepared txns on repl rollback
Branch: master
https://github.com/mongodb/mongo/commit/6f6748705abc029db91c91505ae2c0047049bc46

Comment by Kaloian Manassiev [ 08/Nov/18 ]

What this quote describes is essentially the same as the killAllExpiredTransactions loop. Inside the scanSessions callback you have the session available, so you can get the participant from it and do whatever atomic checks need to be done (isPrepared for example) and decided whether to call kill on the session.

Once all the kill tokens are collected, you can either call SessionCatalog::checkoutSessionForKill inline or on a separate thread if you don't want to block the caller.

One thing to consider is - what happens if a transaction becomes prepared after scanSessions has skipped over it and didn't find it prepared? Or will you abort all transactions regardless of what state they are? See this comment which is related to this situation.

Comment by Judah Schvimer [ 08/Nov/18 ]

We will likely have to create a new function in session_catalog_mongod that both aborts prepared transactions and invalidates the associated sessions. Alternatively, we could call scanSessions and pass in a custom function that aborts prepared transactions (txnParticipant->abortActiveTransaction()).

kaloian.manassiev, any input on the best way to accomplish this with your refactor?

Comment by Judah Schvimer [ 10/Oct/18 ]

Rather than invalidating sessions between the stable timestamp and the common point, we'll need to abort any prepared transactions and possibly invalidate those sessions before calling recoverToStableTimestamp. This will make replication recovery more in line with startup recovery anyways, and it will reconstruct the prepared transactions before users can begin any reads.

Comment by Judah Schvimer [ 04/Oct/18 ]

We'll have to consider if we have to invalidate any sessions between the stable timestamp and the common point before they are recovered.

Generated at Thu Feb 08 04:43:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.