[SERVER-31591] HMAC key refresh thread holds onto an OperationContext Created: 16/Oct/17  Updated: 08/Jan/24  Resolved: 02/Nov/17

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 3.6.0-rc1
Fix Version/s: 3.6.0-rc3

Type: Bug Priority: Major - P3
Reporter: Daniel Gottlieb (Inactive) Assignee: Randolph Tan
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-31101 WT table not dropped after collection... Closed
is related to SERVER-31573 WT table not dropped on primary after... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Sharding 2017-11-13
Participants:

 Description   

The default key refresh time is 3 months: https://github.com/mongodb/mongo/blob/master/src/mongo/db/keys_collection_manager.cpp#L37

There's a background thread responsible for refreshing keys. That thread renews its OperationContext on each iteration. However, an iteration (I believe) can sleep for the full three months when newly negotiated keys are set to expire. Holding this OperationContext open can block WT tables from being dropped. Generally speaking, an OperationContext should be short-lived.



 Comments   
Comment by Githook User [ 02/Nov/17 ]

Author:

{'name': 'Randolph Tan', 'username': 'renctan', 'email': 'randolph@10gen.com'}

Message: SERVER-31591 Don't hold active RecoveryUnit while going to long sleep in HMAC key refresh
Branch: master
https://github.com/mongodb/mongo/commit/ba06ea5e2c4d5c54d6ae715691a306eb0254d7a4

Comment by Kaloian Manassiev [ 27/Oct/17 ]

I spoke with Eric more about this offline and now I understand the problems with releasing the session inside of abandonSnaphshot. Because of this I agree that the next best thing we can do is what Randolph proposed - to use two different OperationContexts.

Comment by Eric Milkie [ 27/Oct/17 ]

I think for this one we should simply switch to using a timed condvar instead of the one provided by OperationContext.

Comment by Eric Milkie [ 27/Oct/17 ]

I actually tried just returning the session to the cache when txnClose() was called. Unfortunately, the code is currently architected such that you can hold cursor objects past the end of a transaction, and the cursor needs access to the session object in order to find its way back into the session's cursor cache. We'd have to find each and every place that destructs cursors outside of a WriteUnitOfWork (or implicit read transaction), and that sounds like a very lengthy and difficult task.
Also, if you return a session to the cache without all of its cursors returned to it, invariants fire. I'm not sure I could even design something that could support having the cursors come back later (or have the cursors inserted into a different session's cursor cache from their original one).

Comment by Kaloian Manassiev [ 27/Oct/17 ]

daniel.gottlieb told me that you tried destroying and recreating _session in abandonSnapshot, but hit some bugs. Is it an option that we try to make that work?

Comment by Eric Milkie [ 27/Oct/17 ]

What you want to have happen is to have the destructor run for the _session member variable of WiredTigerRecoveryUnit, and the only way to do that right now is to destruct the recovery unit itself.

Comment by Eric Milkie [ 27/Oct/17 ]

abandonSnapshot() only ends the current transaction; it does not return the session to the pool.

Comment by Kaloian Manassiev [ 27/Oct/17 ]

The RecoveryUnit is what holds the WT session and this should have been released when the last global lock on an OpCtx is released. Therefore I do not understand what the problem is that we need to shorten the scope of an operation context.

daniel.gottlieb, is it possible that there is a bug in abandonSnapshot?

Comment by Randolph Tan [ 17/Oct/17 ]

I have thought about this more, one relatively easy work around for the key refresh code is to use a separate OperationContext for the sleep. Fortunately, the code that uses the OperationContext is very contained and can be easily scoped:

one operation context that is scoped to this block:
https://github.com/mongodb/mongo/blob/r3.5.13/src/mongo/db/keys_collection_manager_sharding.cpp#L237-L256

and the other that is scoped to this block:
https://github.com/mongodb/mongo/blob/r3.5.13/src/mongo/db/keys_collection_manager_sharding.cpp#L286-L298

Comment by Andy Schwerin [ 17/Oct/17 ]

I discussed this problem with mira.carey@mongodb.com a couple of weeks ago. I think we're going to need to think about the unit of interruptibility as we tease apart the pieces of "Client", "OperationContext" and "TransactionSession" into their appropriate homes. It's a little sticky, especially for these long-lived service threads. I think it's going to remain gross in 3.6, but have to get better during 3.8.

Comment by Spencer Brody (Inactive) [ 16/Oct/17 ]

So the functionality to do conditional variable sleeps through an OperationContext is relatively new (see SERVER-25062) and something we were hoping to move more code towards, not less, to better unify our handling of interruption, deadline expiration, and shutdown. Is there a way we could change the code for sleeping on an OperationContext to release whatever resources are associated with the OperationContext (presumably the RecoveryUnit?) while it's asleep?

CC schwerin

Comment by Daniel Gottlieb (Inactive) [ 16/Oct/17 ]

Yeah, coding style issues aside, a raw delete is sufficient. The other consideration being a gnarly surprise if some future maintainer tries to access the recovery unit in some other part of the key refreshing work.

Comment by Randolph Tan [ 16/Oct/17 ]

daniel.gottlieb Oh, I was not paying attention. I thought releaseRecoveryUnit destroys it. In this particular case, I believe it should be sufficient to just destroy the recovery unit since it is not going to be used after it wakes up again anyway. Or am I missing something?

Comment by Daniel Gottlieb (Inactive) [ 16/Oct/17 ]

renctan It is the recovery unit, but releasing it is not sufficient. The instance must be destroyed/deleted (as opposed to simply being re-attached after the sleep):
https://github.com/mongodb/mongo/blob/068afad15322f6e768cafdfde76dfa92575a17cf/src/mongo/db/operation_context.cpp#L384-L386

More color for the curious:
The member that must have its destructor run: https://github.com/mongodb/mongo/blob/068afad15322f6e768cafdfde76dfa92575a17cf/src/mongo/db/storage/wiredtiger/wiredtiger_recovery_unit.h#L127

More directly:
https://github.com/mongodb/mongo/blob/068afad15322f6e768cafdfde76dfa92575a17cf/src/mongo/db/storage/wiredtiger/wiredtiger_session_cache.cpp#L388-L391

which calls:
https://github.com/mongodb/mongo/blob/068afad15322f6e768cafdfde76dfa92575a17cf/src/mongo/db/storage/wiredtiger/wiredtiger_session_cache.cpp#L328

Comment by Randolph Tan [ 16/Oct/17 ]

daniel.gottlieb What is the exact thing that is blocking WT operations? Is it just the RecoveryUnit, or the entire OperationContext? If it's the former, would calling releaseRecoveryUnit before it goes to sleep help?

Comment by Daniel Gottlieb (Inactive) [ 16/Oct/17 ]

acm There were a few cases of OperationContexts being held open for indefinitely long times that we're trying to squash out for 3.6. If you don't think this can be prioritized in time, please let me know!

Generated at Thu Feb 08 04:27:33 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.