[SERVER-36493] Invalidate in-memory prepared transaction state on replication rollback Created: 07/Aug/18 Updated: 29/Oct/23 Resolved: 11/Dec/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | 4.1.7 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Judah Schvimer | Assignee: | Pavithra Vetriselvan |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | prepare_durability | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Sprint: | Repl 2018-10-08, Repl 2018-10-22, Repl 2018-11-05, Repl 2018-11-19, Repl 2018-12-03, Repl 2018-12-17 | ||||||||||||
| Participants: | |||||||||||||
| Description |
|
Before calling recoverToStableTimestamp, we must abort any prepared transactions. This will require us to iterate through all active sessions and check if there is a prepared transaction on that session. If there is, we must abort the storage transaction. We will not update ServerTransactionMetrics or make any writes to the transactions table. There are several scenarios that we have to consider, but in each it should be safe to abort the prepared transaction and reconstruct the state during startup recovery. For the following examples, prepare timestamp = the oldest prepare timestamp whose corresponding commit/abort oplog entry is not majority committed. Case #1: stable timestamp is before the prepare timestamp When we rollback to a stable timestamp that is before the prepare timestamp, we can safely clear the prepared transaction states because we will reconstruct the prepared transaction and its commit/abort if one exists. We are not writing to the transactions table and will call the corresponding functions in TransactionParticipant (prepareTransaction, commitPreparedTransaction, and abortActiveTransaction) during startup recovery. Therefore, the oldestActiveOplogEntryOpTimes and oldestNonMajorityCommittedOpTimes will be be re-populated accordingly. Case #2: stable timestamp is at the prepare timestamp Since we will be reconstructing prepared transactions even if the stable timestamp is AT the prepare timestamp, this case is the same as Case #1. Case #3: stable timestamp is after the prepare timestamp. The stable timestamp would only advance farther than the prepare timestamp if its corresponding commit/abort oplog entry has been majority committed. This would mean that neither the prepare or commit/abort should be rolled back since the common point would have to be after the majority commit point. In this case, we technically should no longer have information about the prepared transaction, so its safe to clear the data structure. If another prepared transaction exists, we would defer to Case #1 or Case #2. We already have a function that invalidates sessions in session_catalog_mongod that is called in OpObserverImpl::onReplicationRollback. This is used to invalidate sessions that had operations that would have been rolled back. Similarly we would create a function called invalidateSessionsWithPreparedTransactions that would scan all sessions and call txnParticipant->shutdown() on prepared transactions. Currently, the shutdown() function is only used on shutdown and aborts the storage transaction of any transaction. We will modify this function to have a flag isInRollback so we know to only abort prepared transactions if this is set to true. We will also clear the oldestActiveOplogEntryOpTimes and oldestNonMajorityCommittedOpTimes in this function. Finally, we will have to decide how to properly update ServerTransactionsMetrics during a rollback. For example, if we call abortActiveTransaction on a prepared transaction, it will increment the count of both totalAborted and totalPreparedThenAborted. Since we cannot write an integration test for this ticket until the state transition work is in, we will unit test the shutdown and invalidateSessionsWithPreparedTransactions functions. |
| Comments |
| Comment by Githook User [ 11/Dec/18 ] |
|
Author: {'name': 'Pavi Vetriselvan', 'email': 'pvselvan@umich.edu', 'username': 'pvselvan'}Message: |
| Comment by Kaloian Manassiev [ 08/Nov/18 ] |
|
What this quote describes is essentially the same as the killAllExpiredTransactions loop. Inside the scanSessions callback you have the session available, so you can get the participant from it and do whatever atomic checks need to be done (isPrepared for example) and decided whether to call kill on the session. Once all the kill tokens are collected, you can either call SessionCatalog::checkoutSessionForKill inline or on a separate thread if you don't want to block the caller. One thing to consider is - what happens if a transaction becomes prepared after scanSessions has skipped over it and didn't find it prepared? Or will you abort all transactions regardless of what state they are? See this comment which is related to this situation. |
| Comment by Judah Schvimer [ 08/Nov/18 ] |
kaloian.manassiev, any input on the best way to accomplish this with your refactor? |
| Comment by Judah Schvimer [ 10/Oct/18 ] |
|
Rather than invalidating sessions between the stable timestamp and the common point, we'll need to abort any prepared transactions and possibly invalidate those sessions before calling recoverToStableTimestamp. This will make replication recovery more in line with startup recovery anyways, and it will reconstruct the prepared transactions before users can begin any reads. |
| Comment by Judah Schvimer [ 04/Oct/18 ] |
|
We'll have to consider if we have to invalidate any sessions between the stable timestamp and the common point before they are recovered. |