[SERVER-36498] Remove 'config.transactions' entries with 'state' field on shutdown in fCV 4.0 Created: 07/Aug/18 Updated: 29/Oct/23 Resolved: 20/Feb/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication, Upgrade/Downgrade |
| Affects Version/s: | None |
| Fix Version/s: | 4.1.9 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Judah Schvimer | Assignee: | Jack Mulrow |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | prepare_upgrade_downgrade | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||||||
| Sprint: | Sharding 2019-02-11, Sharding 2019-02-25 | ||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||
| Comments |
| Comment by Githook User [ 20/Feb/19 ] |
|
Author: {'name': 'Jack Mulrow', 'username': 'jsmulrow', 'email': 'jack.mulrow@mongodb.com'}Message: |
| Comment by Siyuan Zhou [ 11/Feb/19 ] |
|
> I don't think there is a simple way to differentiate retryable writes config.transactions entries from committed transaction entries Traversing the oplog chain and checking for applyOps will help us differentiate them. 4.0 version of transaction commit is an applyOps command, but applyOps is not allowed for retryable writes. Does that help us here? For large transactions, we will have a new field for those oplog entries in transactions to differentiate them from retryable writes. |
| Comment by Jack Mulrow [ 05/Feb/19 ] |
|
kaloian.manassiev, I realized a problem with using checkOutSessionForKill() is the method throws if the session isn't in the SessionCatalog. That means it can't be used to check out a session that has an entry in config.transactions but hasn't been loaded into memory on the current primary. How do you think I should get around this? I can use an AlternativeClientRegion to check out the session with MongodOperationContextSession which creates the session if it doesn't exist in-memory, but I know you'd like to avoid the ACR. Instead, I could add a way to check out a session without installing it onto the opCtx that also creates the session if necessary (possibly without killing it too), but I'm worried that might make it easier to misuse the SessionCatalog. |
| Comment by Judah Schvimer [ 04/Feb/19 ] |
|
pavithra.vetriselvan should be fixing the kAborted case in |
| Comment by Jack Mulrow [ 04/Feb/19 ] |
Yeah exactly. To get around that I either need to get a new opCtx or use checkOutSessionForKill, but that requires a KillToken, which as far as I know can only be gotten by killing the session. It seems possible we could add a third way to check out a session that doesn't require killing or installing it on the opCtx, but that feels unnecessary.
Yeah, that should guarantee the session's txnState is kCommitted if the session was most recently used to commit a transaction, with prepare or without. I'm realizing we don't check if the transaction record's state is "kAborted" though, so if the session aborted after prepare, refreshing won't transition the session to the kAborted state. I'll add a test case for this to know for sure, but I think I'll have to fix that as part of this patch or distinguish aborted sessions from retryable writes in a different way. |
| Comment by Judah Schvimer [ 04/Feb/19 ] |
|
This all seems good to me. I'd like to avoid killing sessions on upgrade, if possible. I'd also prefer not to use checkOutForKill if we're not killing. That seems confusing and misleading. 3.2.5 invalidating makes sense. |
| Comment by Kaloian Manassiev [ 04/Feb/19 ] |
From your description jack.mulrow, you are just modifying the on-disk state, but not actually putting the participant in the correct state, expecting the next refresh to do that. And I think that's the better way to do it. Otherwise you need to write new code for "patching-up" the participant, which I think is wasteful given that we have the refresh logic. |
| Comment by Kaloian Manassiev [ 04/Feb/19 ] |
|
For 2.2.1 - you are suggesting to use killSession because of problem (1) that you listed, right? Is the problem there that setFCV can be called on a session (which is different from the session you are operating on)? I don't love it that we unnecessarily kill sessions though, but I prefer killing the session over using AlternativeClientRegion - I think the latter is very error prone. For 3.1.2.3 - it is because of this that the transaction participant will indicate that it is part of transaction or retryable write, right? For question (3) - killing a session just interrupts the operation context, but doesn't have anything to do with invalidation. Other than that, jack.mulrow's logic seems good to me. judah.schvimer, for your question (2) - scanSessions only scans what's in memory, but there could be sessions on disk, which have not been accessed and need to be fetched in memory, which requires check out. |
| Comment by Jack Mulrow [ 04/Feb/19 ] |
I was planning on calling it after getting a set of all sessions that potentially need to have their entries modified - for upgrade: all sessions, for downgrade: just sessions that have a "state" field in their config.transactions entry. This all happens after FCV is set to downgrading.
Yeah, good point. We'll have to delete for any txnState that means the session's config.transactions entry points to an oplog entry not supported by 4.0, so kAbort and kCommit. I don't think it's possible for the session to be in kPrepared, since they're checked out after the node is in the downgrading state, so I might add an invariant for that.
The main reason is code simplicity. Checking out for kill doesn't install the session on the opCtx, which makes modifying the transaction table entry while the session is checked out a lot simpler. I was able to make things work using the normal checkout, but I had to create new opCtxs in AlternateClientRegions for both the checkout and then then the modification (to work around session id invariants), which was a little awkward. If you'd rather not kill the sessions though, I'm fine going with the ACR approach for now and we can come back to this in the CR if we don't like how it ends up.
I'm actually not sure if we do need to invalidate the session after the refactor to the TransactionParticipant. I remember in 3.6 for retryable writes the session needed to be invalidated after on-disk changes to force the next thread that checks out the session to refresh its in-memory state, but that may not still be true/required. kaloian.manassiev, do you know if invalidation is necessary?
Got it, I'll try not to kill on upgrade for the first review patch, since like I put above it's just for code simplicity.
Not really, since there might be entries in config.transactions that aren't in the session catalog when we're upgrading/downgrading, so I think we have to scan the whole collection to be sure we don't miss a session.
Yeah that was my thinking too. And if it turns out it's unnecessary, it will be easy to remove. |
| Comment by Judah Schvimer [ 02/Feb/19 ] |
|
2.2.1) At what point are we actually calling kilSession? 1) I think killing all sessions on upgrade is undesirable, but fine if needed. I'm not sure why it's needed though. Downgrade it seems totally fine. |
| Comment by Jack Mulrow [ 01/Feb/19 ] |
|
Ok it sounds like we're fine with the tradeoffs, so I've been thinking about how to implement this, and here's what I have so far. The main idea is to guard the "state" field by FCV and instead of removing the field on downgrade, we delete the transaction table documents that contain it. Proposed implementation:
judah.schvimer, kaloian.manassiev, does this sound reasonable? If so, I have a couple implementation questions:
|
| Comment by Judah Schvimer [ 30/Jan/19 ] |
|
1) I think losing retryability on downgrade is fine. It's a rare operation known to have limitations. |
| Comment by Jack Mulrow [ 30/Jan/19 ] |
|
judah.schvimer, I've started looking into removing entries with "state" on FCV downgrade and only adding "state" in FCV 4.2, and I have a few questions: 1) If we delete these records on downgrade, we'll lose the ability to retry commit for their corresponding transactions after we upgrade. Is that acceptable behavior? If not, can you think of a way around this? 2) If we delete the records, clients will be able to re-use txnNumbers for the sessions associated with them, e.g. a client could commit a transaction in 4.2, downgrade, then re-use that txnId (or use a lower txnNumber) for some other operation. Is that ok? I don't think this would necessarily be dangerous, but it is odd behavior. 3) It looks like we also give a "state" field to entries for transactions that committed w/o prepare, so when we downgrade, we'll delete them and lose the ability to retry commit for them too. Is that acceptable? Or should I find a way to avoid deleting them (like one of the options I give in 4))? 4) To make sure the entries for transactions that commit w/o prepare match the 4.2 format, when we upgrade FCV, we'll have to give them a "state" field (or delete them entirely). The only ways I can think of to distinguish them from retryable writes entries are to manually look up the oplog entry the transaction record points to or to check out its corresponding session. Do you have a preference between these options, or can you think of a better way? 5) Alternatively, we could address 3) and 4) by never adding "state" to the entries for transactions that commit w/o prepare, so they're not affected by the 4.2 upgrade/downgrade at all. From what I can tell, "state" only matters for prepared transactions, but I'm not familiar enough with how we use it to say for sure. What do you think? |
| Comment by Judah Schvimer [ 07/Jan/19 ] |
|
An alternative to consider is removing the entries on fCV downgrade and only adding a 'state' field in fCV 4.2 |
| Comment by Judah Schvimer [ 14/Nov/18 ] |
|
renctan pointed out that any entries that reference problematic oplog entries will contain a 'state' field. I cannot recall why we decided to clear the entire table originally. It seems safe to just remove entries with a 'state' field on shutdown in fCV 4.0 or on fCV downgrade if we change the code to only add a 'state' field in fCV 4.2. |
| Comment by Judah Schvimer [ 14/Nov/18 ] |
|
A problem with not clearing them all is that the 'config.transactions' table could reference 'commitTransaction' or 'abortTransaction' oplog entries that 4.0 does not recognize. |
| Comment by Randolph Tan [ 14/Nov/18 ] |
|
judah.schvimer This sounds like a breaking change. What if we just remove config.transactions documents that have a state field? |
| Comment by Judah Schvimer [ 12/Nov/18 ] |
|
Yes. The idea was that shutting down a 4.2 server in fCV 4.0 will mean you will no longer be able to retry those writes. I think you're right that this could lead to a user/driver thinking it's safe to retry a write when it is not safe. That said, if an entry has "state=committed" its pointer could point to a 'commitTransaction' oplog entry that a 4.0 binary would crash trying to parse. If an entry has "state=aborted" it's unclear what we would change it to for 4.0 to properly understand it since we have no record of the previous version of the entry, which is what a 4.0 binary would have (since 4.0 doesn't update the transactions table on abort). renctan, any ideas for how to make it clear to a 4.0 node that writes on a given session are unsafe to retry? |
| Comment by Randolph Tan [ 12/Nov/18 ] |
|
Does empty mean removing all documents in config.transactions? Does this mean you will no longer be able to safely retry writes when a shutdown occurs? |
| Comment by Judah Schvimer [ 24/Sep/18 ] |
|
When complete, please file a ticket to remove any upgrade/downgrade code unnecessary in MongoDB 4.4 with the "4.3 Required" fixVersion and the "replication-42-fcv-checks" label. |