[SERVER-37348] TransactionReaper and periodic transaction abort thread shouldn't abort transactions on secondaries Created: 27/Sep/18  Updated: 29/Oct/23  Resolved: 25/Feb/19

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.1.9

Type: Task Priority: Major - P3
Reporter: Siyuan Zhou Assignee: Matthew Saltz (Inactive)
Resolution: Fixed Votes: 0
Labels: prepare_errors
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-36483 Transaction reaper should not reap 'c... Closed
depends on SERVER-36485 ‘killSessions’ (for one session) and ... Closed
is depended on by SERVER-38297 Killing session on a secondary curren... Closed
is depended on by SERVER-38876 Ensure secondary user operations cann... Closed
Related
related to SERVER-38297 Killing session on a secondary curren... Closed
related to SERVER-40487 Stop running the RstlKillOpthread whe... Backlog
Backwards Compatibility: Fully Compatible
Sprint: Sharding 2019-01-28, Sharding 2019-02-11, Sharding 2019-02-25, Sharding 2019-03-11
Participants:

 Description   

Periodic transaction killer kills unprepared transactions that run longer than 60 seconds. It shouldn't kill transactions on secondaries. Currently, it's very unlikely to have transactions in "kInProgress" state longer than 60 seconds on secondaries since such transactions become prepared right after applying all their write operations; this will be more likely when we start to support transactions consisting of multiple oplog entries.

Session reaper and other periodic threads may have the same issue and need auditing.



 Comments   
Comment by Githook User [ 25/Feb/19 ]

Author:

{'name': 'Matthew Saltz', 'username': 'saltzm', 'email': 'matthew.saltz@mongodb.com'}

Message: SERVER-37348 Make replication applier batches uninterruptible
Branch: master
https://github.com/mongodb/mongo/commit/35372c0918d1e6e15cc95ecc2883c080c1b198dc

Comment by Matthew Saltz (Inactive) [ 17/Jan/19 ]

Things to do for this ticket (per discussion):
1. Make secondary oplog application batches uninterruptible. Since the session remains checked out while all of a transaction's operations are applied and the transaction is put into prepare, TransactionParticipants should never be accessible in an unprepared state to other operations, so this should be sufficient for preventing transactions from being aborted on secondaries.
2. Add invariant when checking the session back in on secondaries that the transaction participant is not in progress.
3. Add test that the periodic transaction abort thread can't kill a transaction that's stalled in secondary oplog application of the prepare oplog entry.

Comment by Judah Schvimer [ 17/Jan/19 ]

Per discussion, this can be fixed by making secondary oplog application batches uninterruptible.

Comment by Siyuan Zhou [ 16/Jan/19 ]

That's an interesting idea. If we mark transactions in secondary mode, we need to do that whenever we enter secondary mode, unmark it on stepup. Instead, we can enable the transaction reaper on stepup and disable it on stepdown, like what we did for migration manager, assuming the lifecycle of a transaction is entirely managed by the primary (which I believe is true). The transactions for dbhash allowed on secondaries may make things different. Probably, we just leave them alone, since they are not allowed in production?

For the transaction reaper, another option is to let it run all the time, but check whether we are master before killing anything under the RSTL lock.

However, the lifecycle of session isn't very clear to me. I have an impression that it is orthogonal to transactions and can be used for other purposes on both primary and secondary, but it actually cleans up the session / transaction participant and writes into the transaction table, which affects the lifecycle of a transaction. I guess sharding team can shed some light on that.

Comment by Matthew Saltz (Inactive) [ 15/Jan/19 ]

Is it true generally speaking that we don't want to be able to kill a transaction at all that's on a secondary? I'm wondering if it would make sense if, when we start a transaction on the secondary here, we tag the transaction as unkillable due to being on a secondary (or something like that).

Generated at Thu Feb 08 04:45:44 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.