[SERVER-38876] Ensure secondary user operations cannot abort transactions being applied from the oplog Created: 07/Jan/19  Updated: 29/Oct/23  Resolved: 21/Mar/19

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.1.10

Type: Task Priority: Major - P3
Reporter: Judah Schvimer Assignee: Esha Maharishi (Inactive)
Resolution: Fixed Votes: 0
Labels: prepare_errors
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-37348 TransactionReaper and periodic transa... Closed
depends on SERVER-39139 Remove testing support for secondary ... Closed
Backwards Compatibility: Fully Compatible
Sprint: Sharding 2019-01-28, Sharding 2019-02-25, Sharding 2019-03-25
Participants:

 Description   

If a user attempts to use a Session on a secondary for an operation that accepts a transaction number, that operation will check out the session. If the transaction number is greater than one in use by a secondary oplog application thread for the same session, then it could abort that secondary oplog application transaction. This could lead to a crash or data corruption.

One way to fix this could be to prevent user operations on secondaries from checking out a session. Transactions and retryable writes are not allowed on secondaries, so this should be fine (an exception can be made for testing, though would maybe cause test failures).

Alternatively reads outside of a transaction could be prevented from checking out a session since writes are already prevented on secondaries and secondary transactions are already prevented except for in test mode.



 Comments   
Comment by Githook User [ 21/Mar/19 ]

Author:

{'name': 'Esha Maharishi', 'username': 'EshaMaharishi', 'email': 'esha.maharishi@mongodb.com'}

Message: SERVER-38876 Ensure secondary user operations cannot abort transactions being applied from the oplog
Branch: master
https://github.com/mongodb/mongo/commit/9614d52c9e83afdde0ae22e16de97f290a08c206

Comment by Esha Maharishi (Inactive) [ 04/Feb/19 ]

Per offline conversation with judah.schvimer, SERVER-39139 will be implemented to ban the behavior described by this ticket, and the work for this ticket will be to add a test that ensures the ban works as expected for this case.

Comment by Judah Schvimer [ 24/Jan/19 ]

meaning it's only possible for this to occur if secondary oplog application can check a Session back in with a transaction in progress.

This is due to this uassert for if the transaction is prepared, correct?

If so, then I think the invariant created by SERVER-37348 should cover this. Do you agree? I feel like it's worth adding a test for this case though as well.

Comment by Esha Maharishi (Inactive) [ 23/Jan/19 ]

My understanding is that this:

If the transaction number is greater than one in use by a secondary oplog application thread for the same session, then it could abort that secondary oplog application transaction.

would happen because TransactionParticipant::beginOrContinue for the request with the higher transaction number could result in calling TransactionParticipant::_abortTransactionOnSession through TransactionParticipant::_setNewTxnNumber (for either a retryable write or transaction).

However, it's only possible to reach TransactionParticipant::beginOrContinue under a checked out session, meaning it's only possible for this to occur if secondary oplog application can check a Session back in with a transaction in progress.

Generated at Thu Feb 08 04:50:18 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.