[SERVER-58246] Commands flagged as 'never allowed on secondaries' can proceed running after a node steps down from primary Created: 02/Jul/21  Updated: 10/May/22  Resolved: 28/Oct/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Jordi Serra Torrens Assignee: Jordi Serra Torrens
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-61066 Make shardsvr DDL commands check prim... Closed
is related to SERVER-66351 Audit uses of OperationContext::setAl... Open
Operating System: ALL
Sprint: Sharding EMEA 2021-10-18, Sharding EMEA 2021-11-01
Participants:

 Description   

Consider a command flagged as 'kNever' allowed on secondaries.
Before calling the run() of the command, here it is checked that the command is allowed to run given this node repl state. However, the node could transition to secondary once we are past this check and continue running the command.

There are several instances of commands that call 'opCtx->setAlwaysInterruptAtStepDownOrUp();' at the begining of their run() (e.g. here), so that they can get interrupted should the node transition to secondary. However, the node could already have transitioned to secondary after passing the commandCanRunHere check, but before marking the opCtx as 'setAlwaysInterruptAtStepDownOrUp'. In this case, the command would proceed running on a secondary.

It could be worth to atomically check that the command is allowed to run on this node and mark the opCtx as killable on stepdown in the prologue.



 Comments   
Comment by Jordi Serra Torrens [ 28/Oct/21 ]

As per George's comment above, closing this ticket as 'won't do'.
I've created SERVER-61066 to handle this situation in an ad-hoc basis to resolve the two BF that were linked to this ticket.

Comment by George Wangensteen [ 13/Oct/21 ]

I've spent some time looking into this issue, and, more generally, the proliferation of OperationContext::setAlwaysInterruptAtStepDownOrUp in the codebase and in sharding-commands, which I've spoken with jordi.serra-torrens about. I am looking into how we might provide a better mechanism to address the underlying pain of ensuring interruption for certain commands on replication state change, but unfortunately this ticket is not feasible for us right now. 

 

The current commandCanRunHere check in the command-prologue referred to in the description is provided as a courtesy and is best-effort; there is no guarantee (and never has been) that commands that mark themselves as only runnable on primaries will actually be on a primary when their ::run function begins. Providing a strong-check would require holding the RSTL for the duration of the time one must run on the primary, or at the least holding it while checking the current replication state and setting the interruption-request flag on the operation context (although even this should not be necessary: operations are generally interrupted if they had (ever, in their lifetime) taken the RSTL in a mode conflicting with writes - and this is how all operations were interrupted due to in incompatible repl state change before this flag was added). Acquiring the global lock in this way in the command prologue is undesirable and has caused issues in the past; we'd like to avoid doing so. In the past, commands would simply error out/be interrupted later in their lifecycles, after they had determined they would actually need to perform an action incompatible with being on a secondary and therefore taken the RSTL in mode at least IX. I am currently investigating why this approach is not meeting the needs of some code/why the setAlwaysInterruptAtStepDownOrUp flag is being used (jordi.serra-torrens  is more familiar with the details), to see if there is a better solution, but unfortunately we can't do this atomic check in the command prologue now.

 

Note that, in general, the RSTL must be held to safely read the current replication state; the setAlwaysInterruptAtStepDownOrUp function is unsynchronized with this lock and is therefore unsafe without explicitly acquiring the lock if to use if one expects to be a specific replication state when it is called (we hope to make this more clear in the future). 

 

After talking with Jordi we've decided to assign this back to the Sharding EMEA backlog for now so the linked BFs can be re-triaged; then I will close this ticket. 

 

 

Generated at Thu Feb 08 05:44:00 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.