[SERVER-75288] Investigate whether the stepdown killop thread should kill operations that hold the RSTL Created: 24/Mar/23  Updated: 19/Jul/23

Status: Open
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Samyukta Lanka Assignee: Backlog - Replication Team
Resolution: Unresolved Votes: 1
Labels: former-quick-wins, repl-shortlist
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-75285 Deadlock between ShardsvrCheckMetadat... Closed
is related to SERVER-78662 Deadlock with index build, step down,... Closed
is related to SERVER-71198 Assert that unkillable operations tha... Backlog
Assigned Teams:
Replication
Sprint: Repl 2023-04-17, Repl 2023-05-01, Repl 2023-05-15
Participants:
Linked BF Score: 135

 Description   

Right now the killop thread currently kills operations that took the global lock in a mode conflicting with writes. We did not kill operations that held the RSTL, because at the time we added the kill op thread, reads held the RSTL (this is safe because long running reads would periodically yield). This gave a better user experience because otherwise readers would have to handle interruption during failovers.

After lock free reads, many reads no longer take the RSTL. So, we should be able to start killing operations that take the RSTL on stepdown.

This has the benefit of preventing future deadlocks in situations where threads take the global lock in IS mode while implicitly also taking the RSTL, but are blocked waiting on a DB S mode lock that conflicts with a prepared transaction. The prepared transaction would be blocked from committing if the node was trying to stepdown, but couldn't acquire the RSTL due to the reader thread already holding the RSTL.

This work also might fix deadlocks of this nature that are already possible that we haven't noticed yet. However, I'm not yet sure what complications/side effects making this change would introduce.



 Comments   
Comment by Josef Ahmad [ 19/Jul/23 ]

Interrupting any operation holding the RSTL during stepdown would have prevented the deadlock described in SERVER-78662.

Comment by Ali Mir [ 02/May/23 ]

judah.schvimer@mongodb.com Fair enough, that makes sense to me. I haven't fully thought it through yet, but I'm just unsure about the potential complexity around introducing interruptions for all internal read operations on failover (internal operations are non-lock free).

Comment by Judah Schvimer [ 02/May/23 ]

we'd interrupt any non-lock free read operation on failover.

I think we want to do this. To schwerin@mongodb.com's point, the purpose of the RSTL is to synchronize operations with failover. Thus if an operation doesn't want to be interrupted on failover, it shouldn't hold the RSTL.

Comment by Ali Mir [ 02/May/23 ]

Reassigning back to backlog as I'm beginning my loan on Sharding. Happy to continue the conversation with whoever picks this up.

Comment by Ali Mir [ 02/May/23 ]

schwerin@mongodb.com Are you suggesting that we kill operations that hold the RSTL and a DB/collection lock in S mode? Or generally kill operations that hold the RSTL on stepdown? I agree with the first case, and maybe we should consider moving ahead with that idea. The second case seems too broad — we'd interrupt any non-lock free read operation on failover.

Comment by Andy Schwerin [ 02/May/23 ]

If we don't kill these operations, how do we ensure that they abort or release their locks to make room for unplanned step downs. The original use for this lock was to gather up and kill operations that would otherwise block step downs.

Comment by Ali Mir [ 02/May/23 ]

Some notes: even though lock-free reads are used for external non-transaction read operations, internal readers still take the RSTL, and killing those operations would result in interruptions. For that reason, I don't think broadly killing RSTL-holding operations on step down would be feasible. Beyond that, the question for this ticket becomes: how can we prevent deadlock scenarios like in the linked BF ticket? The ultimate root cause of the deadlock was the DBLock acquisition in S mode instead of IS mode, and therefore I'm investigating to see if there is a way to only kill operations that have taken a DB or Collection lock in S mode whilst holding the RSTL in IX mode.

We don't have many cases where we explicitly take the DB or collection lock in S mode, but in the existing ones I believe it's possible to reproduce the same deadlock. However, my concern is that there are some operations (such as non-PIT dbhash) that explicitly take the DB or collection lock in S mode. Killing these operations would still introduce new interruptions (whereas before, they'd complete or yield). However, given the same scenario with a prepared txn and stepup, the same operation would deadlock, so perhaps an interrupt is a better solution here.

Also, looks like SERVER-71198 is quite similar to this ticket (overall PM-3075 is related). Thinking out loud: I wonder if it would be beneficial to share implementation between SERVER-71198 and this one.

Generated at Thu Feb 08 06:29:48 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.