[SERVER-41033] set ignore_prepare=true throughout any part of index building that happens in runWithoutInterruption Created: 07/May/19  Updated: 29/Oct/23  Resolved: 29/May/19

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.1.13

Type: Task Priority: Major - P3
Reporter: Suganthi Mani Assignee: Louis Williams
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-41034 Invariant if we get a prepare conflic... Closed
Related
is related to SERVER-40700 Deadlock between read prepare conflic... Closed
is related to SERVER-44577 Ensure WiredTiger cursors have starte... Closed
is related to SERVER-41462 do not lock RSTL for uninterruptible ... Closed
is related to SERVER-44045 allow secondary index builds to start... Closed
Backwards Compatibility: Fully Compatible
Sprint: Storage NYC 2019-05-20, Execution Team 2019-06-03
Participants:

 Description   

Currently  there can be 3 way deadlock if step up doesn’t wait for index build to complete for the below scenario:

  • Node is in secondary and it starts the index build in back background process in runWithoutInterruptionExceptAtGlobalShutdown . Let's say the index build’s plan executor currently yielded the lock.
  • Step up  will be able to acquire the RSTL lock in mode X and finish completing its step up process.
  • Now, the node is in primary and index build would be able to be blocked by prepared txn due to prepare conflict. (Note: On secondaries, we have the prevention mechanism to block transactions from being prepared if the index build is in progress. So, step up/ rollback which takes X lock and doesn’t kill operations wouldn’t get into 3 way deadlock issues).
  • Node tries to step down which will be blocked behind index. As the step down couldn’t kill index build for 2 reasons 1) connection is internal 2) index build is running w/ interrupt guard (runWithoutInterruptionExceptAtGlobalShutdown).
  • CommitTransaction cmd is waiting for RSTL lock to acquire in IX mode but blocked behind the step down thread.

This ticket has to implement such that step up will wait for background process (index build) to complete after it has acquired the RSTL lock in X mode but released the repl mutex lock.( like we do it in rollback).



 Comments   
Comment by Githook User [ 29/May/19 ]

Author:

{'name': 'Louis Williams', 'email': 'louis.williams@mongodb.com', 'username': 'louiswilliams'}

Message: SERVER-41033 Ignore prepared transactions for index builds

On primaries, index builds can ignore prepare conflicts because index builds take an exclusive
collection lock before completing, requiring prepared transactions to commit or abort.
On secondaries, prepared transactions already block replication until background index builds
complete.
Branch: master
https://github.com/mongodb/mongo/commit/cc1a0e5ef56838820888373c4d01e301a28f066e

Comment by Michael Cahill (Inactive) [ 14/May/19 ]

Index builds can set ignore_prepare=force after WT-4580 is merged.

Comment by Louis Williams [ 14/May/19 ]

We should still be able to complete SERVER-40177 to enforce prepare conflicts on secondaries, and just have special behavior for index builds.

This problem really only blocks WT-4580. I think we should go ahead and complete these two SERVER tickets and reprioritize WT-4580 in the meantime.

Comment by Judah Schvimer [ 14/May/19 ]

It would still be possible to block on prepare conflicts and be uninterruptible in certain phases of an index build so this may be out of the question.

I agree since this would lead to a deadlock.

Postpone SERVER-40177 and WT-4580 indefinitely.

Is there any part of SERVER-40177 that we could do? Like could we enforce them for all operations but index builds? Could we do anything similar in WT-4580 (this seems harder)?

Comment by Louis Williams [ 14/May/19 ]

If we implement this fully, then this ticket will, unfortunately, directly conflict with the work for SERVER-40177 which would move us closer to completing WT-4580, to return errors when performing writes with ignore_prepare=true.

I see a few options:
1. Only ignore prepare conflicts during the collection scan phase. Re-enforce prepare conflicts during the index build drain phase to satisfy WT-4580. It would still be possible to block on prepare conflicts and be uninterruptible in certain phases of an index build so this may be out of the question.
2. Perform the index build drain in separate transactions, suggested by daniel.gottlieb. This would require scanning the side table in one transaction and writing to the index with ignore_prepare=false in another. We have already dealt with numerous bugs in this specific area of code, so this is extremely risky and may be hard to get correct without serious data corruption consequences.
3. Postpone SERVER-40177 and WT-4580 indefinitely.

Comment by Judah Schvimer [ 08/May/19 ]

But, the proposal over here is to abort the transaction on primary. Am I correct?

Yes, if we blocked it, it would almost certainly get aborted at the 1 minute timeout, and not release resources in the meantime, so aborting just seems more straightforward and simpler.

Currently, rollback waits for the index build to finish. If the index build runs for day, aren't we blocking rollback too?

We are blocking rollback. That's fine. While rollback is happening there is generally still a primary able to accept writes. Blocking rollback also seems more unavoidable, at least in certain cases. Rollback is relatively uncommon and a disruptive operation as well.

Comment by Suganthi Mani [ 08/May/19 ]

Instead of blocking stepup on background index builds completing, abort transactions at prepare time if there is a background index build built on a collection included in the transaction. This would be comparable to what we do for secondaries at prepare time now in SERVER-38588.

In SERVER-38588, we block prepare transaction on secondaries if background index build is running on a collection included in the transaction. But, the proposal over here is to abort the transaction on primary. Am I correct?

Set ignore_prepare=true throughout all of index building.

This means, we won't hit prepare conflict for index build that was started during secondary state and got bled into primary. So, when the primary steps down, we still won't be able kill the index build (due to internal operation and runWithoutInterruption). But, we will wait for query executor to yield the IX locks (similar to read operation) for step down to continue. So, this solution fixes the dead lock and sounds reasonable.

Currently, rollback waits for the index build to finish. If the index build runs for day, aren't we blocking rollback too? judah.schvimer

Comment by Judah Schvimer [ 08/May/19 ]

After discussing with milkie, there are two ways we could go about this.

  1. Instead of blocking stepup on background index builds completing, abort transactions at prepare time if there is a background index build built on a collection included in the transaction. This would be comparable to what we do for secondaries at prepare time now in SERVER-38588.
  2. Set ignore_prepare=true throughout all of index building. daniel.gottlieb thinks this is essentially required and milkie thinks it is safe. suganthi.mani, do you agree this would fix the deadlock in this ticket?

I'm repurposing this ticket to be "set ignore_prepare=true throughout any part of index building that happens in runWithoutInterruption" and sending to the storage team.

Comment by Judah Schvimer [ 08/May/19 ]

Looking at this now, this behavior would be undesirable since it could prevent a primary from getting elected for over a day. I'm exploring other solutions.

Generated at Thu Feb 08 04:56:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.