[SERVER-37689] Make recovery from query yield interruptible Created: 20/Oct/18  Updated: 23/Oct/18  Resolved: 23/Oct/18

Status: Closed
Project: Core Server
Component/s: Internal Code, Querying
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Kaloian Manassiev Assignee: Kaloian Manassiev
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-37244 Implement checkOutSessionForKill API ... Closed
Related
related to SERVER-37682 Do not yield locks in an Uninterrupti... Closed
is related to SERVER-37711 Revert the `ReplicationStateTransitio... Closed
Participants:

 Description   

PROBLEM
The server query yielding mechanism uses the saveLockStateAndUnlock/restoreLockState combination of calls in order to get the lock state free all its acquired locks and then re-acquire them.

The re-acquisition of the locks currently must run with UninterruptibleLockGuard because otherwise the interruption exception will unwind to the lock manager's RAII classes destructors and they will try to unlock an already unlocked resource id.

Not being able to interrupt the restoreLockState call though is problematic because it can lead to deadlocks, which would otherwise get resolved automatically.

PROPOSED SOLUTION
I propose making restoreLockState interruptible by introducing an internal variable on the Locker, called boost::optional<LockSnapshot> _yieldedLocks and making saveLockStateAndUnlock place the locks it yielded inside this variable instead of returning them as an output parameter.

restoreLockState will use the values in _yieldedLocks to reacquire the locks back and if it gets interrupted, the Locker::unlock calls can check if a lock was yielded inside the _yieldedLocks list.



 Comments   
Comment by Kaloian Manassiev [ 23/Oct/18 ]

Based on in-person discussion, it was decided that the replication team will make the replication state transitions use mechanism other than the Global X lock in order for them to not require all sessions to be checked-in before the transition can proceed. As a result of this, uninterruptible recovery from yield will no longer be a problem.

I am closing this ticket as Won't Fix in lieu of reverting the following commits, which introduced the PreventCheckingOutSessionsBlock functionality:

Comment by Andy Schwerin [ 22/Oct/18 ]

Yeah, the callers may depend on the locks still being held. Session yielding might work.

Comment by Kaloian Manassiev [ 22/Oct/18 ]

No, in the case where the operation context is interrupted while waiting for lock re-acquisition, the call stack will unwind with the locks not held.

Are you saying it is a requirement because something up the call stack might currently depend on the locks being held?

An alternative option then is that the yielding should yield the session as well.

Comment by Andy Schwerin [ 22/Oct/18 ]

kaloian.manassiev, I don't understand this proposal. It is a requirement, no matter how unfortunate, that a caller recover all held locks at the end of yield(). Would your proposal preserve this?

Comment by Kaloian Manassiev [ 20/Oct/18 ]

david.storch, geert.bosch - can you take a look at this and also at this comment, and let me know which one we should do?

Generated at Thu Feb 08 04:46:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.