[SERVER-46663] runCmdOnPrimaryAndAwaitResponse() should not run DBDirect client command with the rstl lock held. Created: 05/Mar/20  Updated: 09/Mar/20  Resolved: 09/Mar/20

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Suganthi Mani Assignee: Suganthi Mani
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-46664 runCmdOnPrimaryAndAwaitResponse() sho... Closed
Operating System: ALL
Sprint: Execution Team 2020-03-09
Participants:

 Description   

Currently runCmdOnPrimaryAndAwaitResponse() takes RSTL lock in IX mode and performs DBDirectClient command using AlternativeClientRegion. If the DBDirectClient command takes RSTL lock in the AlternativeClientRegion's opCtx, then it can lead to deadlock involving the stepdown thread and the thread that calls runCmdOnPrimaryAndAwaitResponse().
The reason for the deadlock is that, the stepdown thread can't acquire RSTL lock as the caller's original opCtx (including locks) has been stashed and replaced with a new opCtx by runCmdOnPrimaryAndAwaitResponse() using AlternativeClientRegion class. So, no way stepdown can interrupt and make the original opCtx to release the locks. As a result, StepDown thread gets blocked behind runCmdOnPrimaryAndAwaitResponse() RSTL lock due to lock conflict . DBDirectClient command gets blocked behind stepdown as stepdown has enqueued the RSTL lock in X mode. And, runCmdOnPrimaryAndAwaitResponse() will be waiting for the DBDirectClient command's response.


Generated at Thu Feb 08 05:12:05 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.