[SERVER-39976] Two-phase index builds on primaries should ignore interrupts Created: 06/Mar/19  Updated: 08/Jan/24  Resolved: 13/Feb/20

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: 4.3.4

Type: Task Priority: Major - P3
Reporter: Dianna Hohensee (Inactive) Assignee: Louis Williams
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-45852 Two-phase index build constraints sho... Closed
related to SERVER-37729 killOp must fail to kill secondary in... Closed
related to SERVER-44791 Abort index builds by interrupting th... Closed
is related to SERVER-37726 Make dropIndexes abort in-progress in... Closed
Backwards Compatibility: Minor Change
Sprint: Execution Team 2019-09-23, Execution Team 2020-02-10, Execution Team 2020-02-24
Participants:

 Description   

A simultaneous index build should only be killable at shutdown when run on a secondary. We'll need to turn it off if the node steps up; and turn it on if the node steps down. InterruptedAtShutdown errors should still get through, but not killOp errors.



 Comments   
Comment by Eric Milkie [ 13/Feb/20 ]

This commit changes the killop behavior for index builds. Now, the only way to kill an index build is to killOp the connection operation (or use the dropIndexes command). Previously, it was also possible to kill an index build by using killOp on the operation from the index builds coordinator; this is no longer possible.

Comment by Githook User [ 07/Feb/20 ]

Author:

{'name': 'Louis Williams', 'username': 'louiswilliams', 'email': 'louis.williams@mongodb.com'}

Message: SERVER-39976 Two-phase index builds on primaries should ignore interrupts
Branch: master
https://github.com/mongodb/mongo/commit/ab43da8de9068e5cd4ac31f069b33c4bc2d0dd90

Comment by Louis Williams [ 24/Oct/19 ]

brian.lane, yes I seems like we've agreed to make index builds unkillable and require using the dropIndexes command to kill active builds instead, SERVER-37726. To reduce the length of the critical section, we will also do SERVER-39458.

Comment by Brian Lane [ 24/Oct/19 ]

Hey benety.goh - I believe it does.

If the decision was to take that approach, is that a fair statement louis.williams?

I wasn't sure if louis.williams and mira.carey@mongodb.com discussed this offline as I wasn't sure what mira.carey@mongodb.com thought about SERVER-39458.

-Brian

Comment by Benety Goh [ 18/Oct/19 ]

brian.lane, louis.williams, does SERVER-37726 adequately describe the dropIndexes enhancements discussed in this ticket?

Comment by Brian Lane [ 04/Oct/19 ]

Hey louis.williams - I like the idea of using dropindex instead of adding another command. To mira.carey@mongodb.com's point as well - if he is accepting of SERVER-39458 then sgtm. If not, I am happy to have more discussions.

-Brian

Comment by Louis Williams [ 02/Oct/19 ]

brian.lane, can you read over my last comment and let me know what you think? Dianna brought up that it will lead to unexpected killOp behavior, which is true. The suggestion I made was to provide an alternative option to killing indexes, using dropIndex, or potentially a new command.

mira.carey@mongodb.com one of the main goals of this project is to have index builds survive stepdowns, so yes they will continue to be exceptional. Index builds yield locks frequently, but there is one main critical section. This is a long section at the end of the build where writes are blocked and the final entries are applied to the index. We do plan on making the length of that critical section as short as possible, see SERVER-39458, in anticipation of this problem. If we can put measures in place to shorten this critical section, which could block stepdown, do you think this is still a reasonable approach?

Comment by Mira Carey [ 01/Oct/19 ]

To chime in here, we currently have a policy of avoiding making opCtx's unkillable unless absolutely necessary. In general, having unkillable ops makes it hard to reason about whether stepdown can succeed, reduces our ability to resolve things like prepare conflicts and generally indicates to me that we have some unfortunately long and dangerous critical section buried somewhere (if the operation fails for some other reason, what state have we left the system in).

Index builds have long been the exception to the rule, but every time we double down there I'm going to want to bring this up again and make sure we've exhausted our alternatives.

Comment by Dianna Hohensee (Inactive) [ 01/Oct/19 ]

louis.williams, you should run the idea by product: making something on a primary unable to be killed. This will make the killOp cmd not work against everything: I believe today the only ops it cannot kill are replication threads on secondaries.

Comment by Louis Williams [ 01/Oct/19 ]

After discussing with geert.bosch, I think we want to take the following approach instead:

  • Make all IndexBuildCoordinator threads uninterruptible. This solves the problem of ignoring interrupts by killOp or step down.
  • If an index build needs to be killed by a user, they will use the dropIndexes command to abort the in-progress index build (see SERVER-37726). This will abort the index build and write the "abortIndexBuild" oplog entry.
  • We may optionally choose to implement an abortIndexBuild command in the future, but only if we find that to be a better approach for some reason.

This work for this ticket should be to simply make all index build threads uninterruptible except at shutdown.

Comment by Dianna Hohensee (Inactive) [ 20/Sep/19 ]

For what it's worth, we have already have a OperationContextGroup to mass modify OperationContexts; and you could add stepUp and stepDown hooks to the IndexBuildsCoordinator to mark OperationContexts as interruptible or impervious – along with getting rid of the code block of un-interruptibility from IndexBuildsCoordinator and replacing it with more flexible OperationContext functions to set and unset interruptibility modes.

Comment by Louis Williams [ 20/Sep/19 ]

After more discussion, we have considered reverting the behavior that makes secondaries ignore interrupts. In 4.0 and before, interrupting a secondary index build crashed a node. The behavior added to 4.0 anticipated the case where a system administrator uses killop on a primary and races with stepdown. This is a hard case to justify considering the amount of work required, and it is not a correctness problem if the node crashes.

Considering the interruptibility due to step-up is a little more complicated. We may still need error code whitelist to pass to _runWithoutInterruption to ignore "InterruptedDueToReplStateChange" in addition to "InterruptedAtShutdown". If not a whitelist, we may still need to implement my first proposed solution. This is not ideal and makes understanding the interruptibility of operations more complicated based off of transient state, instead of the code as it is written.

Comment by Louis Williams [ 10/Sep/19 ]

I think this is going to require some interesting service arch changes. At the moment, interrupt suppression is handled by an IgnoreInterruptsGuard, which effectively sets a flag, _ignoreInterrupts, on the OperationContext to early-return in checkForInterruptNoAssert().

Once the _buildIndex() function is called to actually build the index, the _ignoreInterrupts state is no longer modified by the OperationContext, and stays that way until the function returns. This is problematic because we need a way to dynamically control whether this function is interruptible, depending on the replication state.

I can think of a couple of options:

  • checkInterruptNoAssert needs to know the current replication state to decide whether interrupts should still be suppressed. This could be implemented as follows: checkForInterruptNoAssert() would check some precondition set with a new function, runWithoutInterruptionWhile(Callback&& cb, Predicate&& predicate). Rather than checking _ignoreInterrupts, checkForInterruptNoAssert() would check the Predicate stored on its OperationContext to determine if the operation is interruptible (also with the exception of global shutdown). This predicate would be implemented by the IndexBuildsCoordinator as a check for the current replication state.
  • create onStepUp and onStepDown hooks in ReplicationCoordinatorExternalState. These would need to somehow locate the OperationContext building the index, and then refactor the push/popIgnoreInterrupts to unset the _ignoreInterruptsState in the event of a replication state change. In terms of "locating" the OperationContext, this seems challenging and might require a registry of OperationContext/Client to index build mappings. To me, this is asking for concurrency bugs.

mira.carey@mongodb.com, do you mind giving some feedback on whether or not these ideas are realistic and could manifiest in a service arch ticket? 

Generated at Thu Feb 08 04:53:40 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.