[SERVER-41033] set ignore_prepare=true throughout any part of index building that happens in runWithoutInterruption Created: 07/May/19 Updated: 29/Oct/23 Resolved: 29/May/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | 4.1.13 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Suganthi Mani | Assignee: | Louis Williams |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||
| Sprint: | Storage NYC 2019-05-20, Execution Team 2019-06-03 | ||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||
| Description |
|
Currently there can be 3 way deadlock if step up doesn’t wait for index build to complete for the below scenario:
This ticket has to implement such that step up will wait for background process (index build) to complete after it has acquired the RSTL lock in X mode but released the repl mutex lock.( like we do it in rollback). |
| Comments |
| Comment by Githook User [ 29/May/19 ] |
|
Author: {'name': 'Louis Williams', 'email': 'louis.williams@mongodb.com', 'username': 'louiswilliams'}Message: On primaries, index builds can ignore prepare conflicts because index builds take an exclusive |
| Comment by Michael Cahill (Inactive) [ 14/May/19 ] |
|
Index builds can set ignore_prepare=force after |
| Comment by Louis Williams [ 14/May/19 ] |
|
We should still be able to complete This problem really only blocks |
| Comment by Judah Schvimer [ 14/May/19 ] |
I agree since this would lead to a deadlock.
Is there any part of |
| Comment by Louis Williams [ 14/May/19 ] |
|
If we implement this fully, then this ticket will, unfortunately, directly conflict with the work for I see a few options: |
| Comment by Judah Schvimer [ 08/May/19 ] |
Yes, if we blocked it, it would almost certainly get aborted at the 1 minute timeout, and not release resources in the meantime, so aborting just seems more straightforward and simpler.
We are blocking rollback. That's fine. While rollback is happening there is generally still a primary able to accept writes. Blocking rollback also seems more unavoidable, at least in certain cases. Rollback is relatively uncommon and a disruptive operation as well. |
| Comment by Suganthi Mani [ 08/May/19 ] |
In
This means, we won't hit prepare conflict for index build that was started during secondary state and got bled into primary. So, when the primary steps down, we still won't be able kill the index build (due to internal operation and runWithoutInterruption). But, we will wait for query executor to yield the IX locks (similar to read operation) for step down to continue. So, this solution fixes the dead lock and sounds reasonable. Currently, rollback waits for the index build to finish. If the index build runs for day, aren't we blocking rollback too? judah.schvimer |
| Comment by Judah Schvimer [ 08/May/19 ] |
|
After discussing with milkie, there are two ways we could go about this.
I'm repurposing this ticket to be "set ignore_prepare=true throughout any part of index building that happens in runWithoutInterruption" and sending to the storage team. |
| Comment by Judah Schvimer [ 08/May/19 ] |
|
Looking at this now, this behavior would be undesirable since it could prevent a primary from getting elected for over a day. I'm exploring other solutions. |