[SERVER-39139] Remove testing support for secondary transactions Created: 22/Jan/19 Updated: 29/Oct/23 Resolved: 06/Mar/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication, Testing Infrastructure |
| Affects Version/s: | None |
| Fix Version/s: | 4.1.9 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Judah Schvimer | Assignee: | Vesselina Ratcheva (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | prepare_durability | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||||||||||||||
| Sprint: | Repl 2019-02-11, Repl 2019-02-25, Repl 2019-03-11 | ||||||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||||||
| Linked BF Score: | 18 | ||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
We currently support secondary transactions when test commands are enabled for background "dbhash". This is not going to work due to |
| Comments |
| Comment by Githook User [ 06/Mar/19 ] |
|
Author: {'name': 'Vesselina Ratcheva', 'username': 'vessy-mongodb', 'email': 'vesselina.ratcheva@10gen.com'}Message: This reverts commit a74b2f39025fee2c59aa5437deea6e06f05e18ca. |
| Comment by Githook User [ 06/Mar/19 ] |
|
Author: {'name': 'Vesselina Ratcheva', 'email': 'vesselina.ratcheva@10gen.com', 'username': 'vessy-mongodb'}Message: |
| Comment by Githook User [ 06/Mar/19 ] |
|
Author: {'name': 'Vesselina Ratcheva', 'email': 'vesselina.ratcheva@10gen.com', 'username': 'vessy-mongodb'}Message: |
| Comment by Githook User [ 28/Feb/19 ] |
|
Author: {'name': 'Vesselina Ratcheva', 'username': 'vessy-mongodb', 'email': 'vesselina.ratcheva@10gen.com'}Message: Revert " This reverts commit 58fad7d7efa275beb4a6a83f90d3dd222bbb534b. |
| Comment by Githook User [ 26/Feb/19 ] |
|
Author: {'name': 'Vesselina Ratcheva', 'username': 'vessy-mongodb', 'email': 'vesselina.ratcheva@10gen.com'}Message: |
| Comment by Githook User [ 11/Feb/19 ] |
|
Author: {'name': 'Vesselina Ratcheva', 'email': 'vesselina.ratcheva@10gen.com', 'username': 'vessy-mongodb'}Message: |
| Comment by Githook User [ 06/Feb/19 ] |
|
Author: {'name': 'Vesselina Ratcheva', 'email': 'vesselina.ratcheva@10gen.com', 'username': 'vessy-mongodb'}Message: |
| Comment by Githook User [ 04/Feb/19 ] |
|
Author: {'name': 'Vesselina Ratcheva', 'email': 'vesselina.ratcheva@10gen.com', 'username': 'vessy-mongodb'}Message: |
| Comment by Esha Maharishi (Inactive) [ 31/Jan/19 ] |
|
Just to be clear, I would like to re-state that implementing the ban this way is just to future-proof the code:
|
| Comment by Esha Maharishi (Inactive) [ 31/Jan/19 ] |
|
judah.schvimer, ah, I did not know about that check for canAcceptWritesForDatabase in abortActiveTransaction. As far as I can tell, that should prevent the race for a direct abortTransaction command. Perhaps Vessy can implement the ban under this ticket by doing a similar check in beginOrContinue, and I can write tests for both the higher transaction number and direct abortTransaction command under |
| Comment by Judah Schvimer [ 31/Jan/19 ] |
|
I think beginOrContinue acquiring the RSTL in mode IX before aborting the transaction makes sense. The lock acquisition will be an interrupt point already. With the RSTL held we need to check canAcceptWritesForDatabase(opCtx, "admin") to see if we're still a primary. For abortTransaction, we already check if we're primary here. esha.maharishi, does that not account for the race you're concerned about for the abortTransaction command? |
| Comment by Siyuan Zhou [ 31/Jan/19 ] |
|
Talked with esha.maharishi offline. We agree that an abortTransaction on prepared transaction on secondary is also possible. Aborting an unprepared or prepared transaction by either a higher txnNumber or abortTransaction command are faced with the same race mentioned by Esha. One possible solution is to have beginOrContinue hold in RSTL IX lock and check for interrupt. Esha also mentioned that it's possible for a participant primary to get an abortTransaction from the user even though the coordinator is about to commit the transaction. This will corrupt the data, but that's intentional since we want to give users a way to fix the transactions when the coordinator is down. |
| Comment by Siyuan Zhou [ 31/Jan/19 ] |
|
esha.maharishi, does your argument apply to prepared transactions on secondary? As you said, the window where a transaction is in-progress is very short and protected by the session checkout, but aborting a prepared transaction by a user request on secondary also seems possible. |
| Comment by Esha Maharishi (Inactive) [ 31/Jan/19 ] |
|
One thing to consider for #1 is that commandCanRunHere is checked here in ServiceEntryPointCommon::execCommandDatabase, but the node may transition to secondary in between that and calling beginOrContinue on the TransactionParticipant. From my understanding of I believe this wouldn't happen today, because secondary oplog application checks out the session and puts the transaction into prepare before checking the session back in. So, by the time the user request is able to call beginOrContinue on the transaction, the transaction should no longer be abortable. However, to future-proof the code, another option for implementing the ban could be to hold the global IX lock in TransactionParticipant::beginOrContinue across checking whether the node is primary and calling _setNewTxnNumber. (If the node is not primary, beginOrContinue should uassert.) Since implementing the ban this way would address both issues, it may be worth doing just what I suggested instead of doing both. |
| Comment by Vesselina Ratcheva (Inactive) [ 30/Jan/19 ] |
|
tess.avitabile that's a good point. Thanks! max.hirschhorn that sounds very reasonable to me. I should be able to remove the usages in a separate commit, which I hope would make it easier to put them back in in the future. |
| Comment by Max Hirschhorn [ 30/Jan/19 ] |
I'd vote for removing the CheckReplDBHashInBackground hook from the YAML suite files for the time being. The background thread continuously spawns a mongo shell process to run run_check_repl_dbhash_background.js, i.e. it spawns a new one as soon as the earlier one exits. Putting an early return in will cause that happen more frequently than it already does, which feels wasteful and likely chattier for our logs. |
| Comment by Tess Avitabile (Inactive) [ 30/Jan/19 ] |
|
Looks great! I think we may not need to do (3), since transactions against secondaries will fail after (1). |
| Comment by Vesselina Ratcheva (Inactive) [ 30/Jan/19 ] |
|
I have so far identified several work items: 1) Disallow secondary transactions on the server side even when test commands are enabled. I intend to leverage the code here and remove the exception for test commands. I will also update the AllowedOnSecondary rules for commit and abort here. Feel free to chime in if you feel there's anything missing here. |
| Comment by Judah Schvimer [ 23/Jan/19 ] |
|
This ticket should first disable background dbhash testing. Background dbhash code shouldn't be completely removed, since it will be replaced with a separate snapshot read mechanism, and we'll leave it to the STM team to remove the code they want removed. |