[SERVER-61251] Ensure long running storage engine operations are interruptible Created: 04/Nov/21  Updated: 06/Dec/23

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Judah Schvimer Assignee: Backlog - Storage Engines Team
Resolution: Unresolved Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-83186 Writers can get stuck in cache evicti... Open
is related to SERVER-64982 Extended lack of availability caused ... Blocked
is related to SERVER-56756 Primary cannot stepDown when experien... Closed
is related to WT-10958 Session API to roll-back a transactio... Open
is related to SERVER-71520 Dump all thread stacks on RSTL acquis... Closed
is related to SERVER-77172 "abortExpiredTransactions" thread can... Backlog
Assigned Teams:
Storage Engines
Participants:

 Description   

Long running storage engine operations don't have interrupt points and thus can block step down.



 Comments   
Comment by Judah Schvimer [ 22/May/23 ]

Thanks alexander.gorrod@mongodb.com, it was a general idea from a Workload Management discussion. I don't think it's critical.

Comment by Alexander Gorrod [ 22/May/23 ]

Is that a question for WiredTiger judah.schvimer@mongodb.com? As in, could a transaction be paused and restarted? The answer is not safely - we built a mechanism a bit like that for prepared transactions, and the corner cases are numerous and annoying (we are in fact still chasing them, for example.

We are taking small steps in that direction, but I expect it will be a fair while before it's a tractable amount of work in WiredTiger. Let me know if it's important enough to justify doing a bit more design than my gut response and we'll give a more complete answer.

cc mick.graham@mongodb.com

Comment by Judah Schvimer [ 19/May/23 ]

A related question is if long running WT operations can be yieldable, especially for Workload Management once we start prioritizing amongst running queries.

Comment by Louis Williams [ 19/Apr/23 ]

I wanted to link to a comment that sue.loverso@mongodb.com left on WT-10892. In an attempt to reduce the overhead and risk of frequent interrupt checking in fast code paths, what we need at a minimum is to check for interrupts in the slow WiredTiger code paths that can block indefinitely, specifically in the eviction worker loop. What we want is to be able to pull operations out of the eviction loop, since that is where we find operations blocked inside WiredTiger most of the time.

Comment by Dianna Hohensee (Inactive) [ 18/Apr/23 ]

Some additional information from working on SERVER-70201, to interrupt WT::compact from the MDB layer.

We ended up passing a pointer to the opCtx into the storage layer for WT::compact interrupt checks. So if the opCtx in the MDB layer gets interrupted, then WT::compact can see that eventually and quit. This could be expanded generally for all MDB operations (all have an opCtx) accessing the WT layer.

I think this solution addresses the concern that operations would be immediately resubmitted to WT and nothing would be gained: operations in the MDB layer are interrupted today with positive effect – not just restarted immediately with no change. 

Comment by Alexander Gorrod [ 03/Apr/23 ]

Notes from a group conversation about this:

The primary use case to be addressed here is short running operations that can sometimes be held for a long time by WiredTiger.

Ideally, WiredTiger would provide a mechanism by which MongoDB could notify us that an operation should be interrupted. keith.smith@mongodb.com indicated that there is an existing callback mechanism in place for compact that can trigger an interrupt.

I indicated that it's important for server and storage engines engineers to collaborate on this work. It wouldn't be generally beneficial to give up on operations if the response is for the server to resubmit the same work to WiredTiger in a new transaction. judah.schvimer@mongodb.com indicated that a Storage Execution engineer would be most suitable from the server team to work on this.

steve.kuhn@mongodb.com If you think this work is worthwhile, we should create a WiredTiger ticket to track the Storage Engines work and figure out how to get the time scheduled.

Comment by Alexander Gorrod [ 13/Mar/23 ]

I believe that compact is a different case to the others referenced in this ticket. Compact is a complex, long running command in WiredTiger. It is reasonable to add interrupt points into compact and return early if the caller wants that behavior.

Other cases described here were a WT_SESSION::commit or WT_SESSION::abort can take a long time to complete are different. There are two possible causes for those APIs to take a long time: They are resolving a prepared transaction which can be expensive. The WiredTiger cache is over-subscribed (generally due to being overwhelmed by too much concurrent activity), and the transaction resolution is being tasked with helping ensure the cache doesn't become oversubscribed (which can lead to processes being killed by the operating system).

steve.kuhn@mongodb.com Could we chat about what to do with this ticket? It bounces back to us periodically, and it would be nice to have a concrete plan. Is that something you could help create?

Comment by Fausto Leyva (Inactive) [ 21/Feb/23 ]

One potential idea is to generalize the solution from compact when we made it interruptible. 

Comment by Lingzhi Deng [ 09/Nov/21 ]

For the help ticket, mongo::WiredTigerRecoveryUnit::_commit and probably mongo::WiredTigerRecoveryUnit::_abort too. But I am not sure what does it mean to interrupt an abort. More generally though, I think we want a solution to make all storage operations interruptible or have a way at the mongodb layer to avoid being blocked on a storage operation.

Comment by Gregory Noma [ 09/Nov/21 ]

lingzhi.deng which operation was this? Any additional information that would be useful here?

Generated at Thu Feb 08 05:51:57 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.