[SERVER-58246] Commands flagged as 'never allowed on secondaries' can proceed running after a node steps down from primary Created: 02/Jul/21 Updated: 10/May/22 Resolved: 28/Oct/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Jordi Serra Torrens | Assignee: | Jordi Serra Torrens |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Operating System: | ALL | ||||||||||||
| Sprint: | Sharding EMEA 2021-10-18, Sharding EMEA 2021-11-01 | ||||||||||||
| Participants: | |||||||||||||
| Description |
|
Consider a command flagged as 'kNever' allowed on secondaries. There are several instances of commands that call 'opCtx->setAlwaysInterruptAtStepDownOrUp();' at the begining of their run() (e.g. here), so that they can get interrupted should the node transition to secondary. However, the node could already have transitioned to secondary after passing the commandCanRunHere check, but before marking the opCtx as 'setAlwaysInterruptAtStepDownOrUp'. In this case, the command would proceed running on a secondary. It could be worth to atomically check that the command is allowed to run on this node and mark the opCtx as killable on stepdown in the prologue. |
| Comments |
| Comment by Jordi Serra Torrens [ 28/Oct/21 ] |
|
As per George's comment above, closing this ticket as 'won't do'. |
| Comment by George Wangensteen [ 13/Oct/21 ] |
|
I've spent some time looking into this issue, and, more generally, the proliferation of OperationContext::setAlwaysInterruptAtStepDownOrUp in the codebase and in sharding-commands, which I've spoken with jordi.serra-torrens about. I am looking into how we might provide a better mechanism to address the underlying pain of ensuring interruption for certain commands on replication state change, but unfortunately this ticket is not feasible for us right now.
The current commandCanRunHere check in the command-prologue referred to in the description is provided as a courtesy and is best-effort; there is no guarantee (and never has been) that commands that mark themselves as only runnable on primaries will actually be on a primary when their ::run function begins. Providing a strong-check would require holding the RSTL for the duration of the time one must run on the primary, or at the least holding it while checking the current replication state and setting the interruption-request flag on the operation context (although even this should not be necessary: operations are generally interrupted if they had (ever, in their lifetime) taken the RSTL in a mode conflicting with writes - and this is how all operations were interrupted due to in incompatible repl state change before this flag was added). Acquiring the global lock in this way in the command prologue is undesirable and has caused issues in the past; we'd like to avoid doing so. In the past, commands would simply error out/be interrupted later in their lifecycles, after they had determined they would actually need to perform an action incompatible with being on a secondary and therefore taken the RSTL in mode at least IX. I am currently investigating why this approach is not meeting the needs of some code/why the setAlwaysInterruptAtStepDownOrUp flag is being used (jordi.serra-torrens is more familiar with the details), to see if there is a better solution, but unfortunately we can't do this atomic check in the command prologue now.
Note that, in general, the RSTL must be held to safely read the current replication state; the setAlwaysInterruptAtStepDownOrUp function is unsynchronized with this lock and is therefore unsafe without explicitly acquiring the lock if to use if one expects to be a specific replication state when it is called (we hope to make this more clear in the future).
After talking with Jordi we've decided to assign this back to the Sharding EMEA backlog for now so the linked BFs can be re-triaged; then I will close this ticket.
|