[SERVER-43076] What is the correct way to access the OperationContext in the commit method of a RecoveryUnit::Change implementation? Created: 29/Aug/19 Updated: 29/Oct/23 Resolved: 20/Sep/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Storage |
| Affects Version/s: | None |
| Fix Version/s: | 6.2.0-rc0 |
| Type: | Question | Priority: | Major - P3 |
| Reporter: | Alexander Taskov (Inactive) | Assignee: | Henrik Edin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Sprint: | Execution Team 2022-09-19, Execution Team 2022-10-03 | ||||||||
| Participants: | |||||||||
| Description |
|
The opCtx we were under when the handler was created is not valid during commit. What is the prescribed way to access the current opCtx? The approach we are currently using is to retrieve the opCtx from the current client: auto opCtx = cc().getOperationContext();
|
| Comments |
| Comment by Githook User [ 20/Sep/22 ] |
|
Author: {'name': 'Henrik Edin', 'email': 'henrik.edin@mongodb.com', 'username': 'henrikedin'}Message: Supports multi-document transactions where the OperationContext instance may change. |
| Comment by Kaloian Manassiev [ 12/Sep/22 ] |
|
I think it is more like that onCommit/onRollback shouldn't be allowed to access the transaction context, because it is too late for them to be making any changes to it or even to introspect it. We just need the "OS" part of the context, which is about taking locks. |
| Comment by Henrik Edin [ 12/Sep/22 ] |
|
I think it would be a nice cleanup to separate the OperationContext into a context for the whole transaction and one for the individual operations within that transaction. But with such a change code would still need to access both as meaningful state exist on both sides. I can buy an argument that in such a world commit/rollback handlers should only need the transaction context part (as they operate on transactions). We currently have lots of decorations living on the OperationContext that should probably be moved to the "TransactionContext". It is going to be some work to fix all this and I think the first step is to simply make it possible to access the correct OperationContext pointer in commit/rollback handlers. Over time, when we have created a "TransactionContext" we can move over to that system. |
| Comment by Kaloian Manassiev [ 12/Sep/22 ] |
|
Just related to the previous comment (doesn't solve the overall problem of this ticket): The OperationContext is a fundamental part of our "tread management" infrastructure and we should not have blocking operations not taking one, both for diagnostics/tracking purposes and for interruptability. If there are places taking ResourceLocks that are not safe to be interrupted, they should be explicitly marked as such. |
| Comment by Louis Williams [ 12/Sep/22 ] |
|
I feel that we need to separate OperationContext into the parts that represent a single operation running on a single thread and the parts that represent its longer-lived transactional state that can inhabit multiple OperationContexts. Things like Locker and RecoveryUnit can be used on multiple OperationContexts and should probably have their own separate state. Part of the reason why we were capturing OpCtx in the LogOpForShardingHandler was to acquire a ResourceLock, which required an OperationContext argument. |
| Comment by Henrik Edin [ 07/Sep/22 ] |
|
Reopening this and proposing that we implement an idiomatic way to get access to the OperationContext in commit/rollback handlers. It is already common that we capture the OperationContext in commit handlers. The OperationContext is used to access various decorations or validate some lock state. We also require the OperationContext to be able to perform writes to the CollectionCatalog which can be required in commit/rollback handlers. It seems fragile that this pattern works except in the context of multi-document transactions. When more and more support is added within multi-document transactions new code paths that capture an OperationContext may execute in a way that is currently not safe. |
| Comment by Daniel Gottlieb (Inactive) [ 05/Sep/19 ] |
|
There's not currently a better way to do this. Looking at the provided code snippet, I see the operation context is only being used to acquire a lock. One alternative is to instead grab that lock prior to adding a _stmt to the LogTransactionOperationsForShardingHandler. Assuming that lock participates in two phase locking (2PL), that will guarantee the lock is held for the entirety of the transaction. The obvious trade-offs here:
I'm speculating quite a bit at this point; I really know nothing about these structures. If the CSRLock is for protecting state on the MigrationSourceManager, perhaps that state can versioned such that a lock is not required (or it can be turned into a mutex)? If the version meaningfully changed since the LogTransactionOperationsForShardingHandler was constructed, that would presumably end up in this no-op path? Just brainstorming here, but an API change that's theoretically possible; we could add a "RecoveryUnit::preCommitHook" that does take in an OperationContext and if any of those hooks fail, the storage transaction can safely abort. Would this in essence solve the problem? We'd obviously have to think whether that's something we'd actually want to add. |
| Comment by Alexander Taskov (Inactive) [ 05/Sep/19 ] |
|
The code that brought up this question is here: https://github.com/mongodb/mongo/blob/master/src/mongo/db/s/migration_chunk_cloner_source_legacy.cpp#L167 Basically, we're in the commit method of the Change handler that was created here. The opCtx that was valid during the creation is not the same one in effect when commit is called so we can't cache it then. We are currently just pulling the opCtx from the current client but we're wondering if there is a better way to do this. |
| Comment by Daniel Gottlieb (Inactive) [ 05/Sep/19 ] |
|
Poke kaloian.manassiev alex.taskov if you have any details worth sharing. |
| Comment by Daniel Gottlieb (Inactive) [ 30/Aug/19 ] |
|
I agree that having an OperationContext would be useful, but it's a bit dangerous to expose. onCommit handlers must not fail and the main uses of an OperationContext (from my perspective) is to acquire locks or to write to storage. Both of which are expected to fail. Do you mind sharing the intended use-case? |
| Comment by Kaloian Manassiev [ 30/Aug/19 ] |
|
louis.williams, I suggested that Alex files this ticket, but didn't mean it to be a question - rather for the storage team to provide a way for accessing the committing OpContext from within the commit/abort methods of the RecoveryUnit.
With the introduction of transactions, the recovery unit can now actually span multiple operation contexts and because of this, it is no longer safe to store a pointer to the parent OpContext. This is why we brought up this question. |
| Comment by Louis Williams [ 29/Aug/19 ] |
|
alex.taskov, the RecoveryUnit::Change handlers are executed by the RecoveryUnit when a WriteUnitOfWork is committed. Additionally, an OperationContext owns a RecoveryUnit, so it should always outlive the RecoveryUnit. Since the OperationContext outlives its RecoveryUnit, it should always be valid to capture the OperationContext pointer in the Change lambda. See this example. The SERVER project is only for reporting bugs or feature requests. Do you have a link to code or a specific bug where you are experiencing this problem? |