[SERVER-40594] Range deleter in prepare conflict retry loop blocks step down Created: 11/Apr/19 Updated: 29/Oct/23 Resolved: 23/May/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication, Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 4.1.12 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Jack Mulrow | Assignee: | Matthew Saltz (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Steps To Reproduce: | Note that this repro relies on sleeps to have the collection range deleter run after the transaction is prepared and before the step down attempt, so it may need to be repeated to trigger the hang.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Sprint: | Sharding 2019-05-06, Sharding 2019-05-20, Sharding 2019-06-03 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Linked BF Score: | 19 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
Replication step down requires the ReplicationStateTransitionLock in MODE_X and kills user operations, but it doesn't kill internal operations, like those run by the collection range deleter. If the range deleter runs and enters a prepare conflict retry loop (which waits without yielding locks), it will hang until the prepared transaction modifying the data it is reading commits or aborts. The RSTL can't be taken in exclusive mode until the range deleter operation finishes, so during this time all step down attempts will time out waiting for the RSTL. This should also be a problem for step up (and other operations that require the RSTL) and may be triggered by other internal operations that can read prepared data, but I've only seen this so far with step down and the range deleter. The step up case might be worse, because a prepared transaction can't commit or abort and unblock an internal operation if there's no primary. |
| Comments |
| Comment by Githook User [ 23/May/19 ] |
|
Author: {'email': 'matthew.saltz@mongodb.com', 'name': 'Matthew Saltz', 'username': 'saltzm'}Message: |
| Comment by Judah Schvimer [ 13/May/19 ] |
|
If an unconditional stepdown happens between prepare and commit, then the node that needs to step down should have to effectively wait forever for the transaction to be committed (since it will only receive commit as a secondary). |
| Comment by Matthew Saltz (Inactive) [ 13/May/19 ] |
|
judah.schvimer I think the reason is that the situation described in this ticket for step down isn't a true deadlock - it's only problematic as long as the transaction never commits or aborts. As soon as the transaction completes, step down can continue and succeed. Since our tests are written to not leave prepared transactions hanging (AFAIK), I wouldn't expect to run into this in the concurrency suites. jack.mulrow can you confirm that this understanding is correct? |
| Comment by Judah Schvimer [ 08/May/19 ] |
|
As part of this ticket, I would like to investigate why we aren't catching these deadlocks in our concurrency_sharded_with_stepdowns_and_balancer suite. Are chunk migrations or range deletion not getting prepare conflicts for some reason? Are we not actually moving any chunks from the balancer? CC max.hirschhorn |
| Comment by Suganthi Mani [ 07/May/19 ] |
|
This ticket is not blocked on the other work ( |
| Comment by Judah Schvimer [ 07/May/19 ] |
|
Are there any other parts of chunk migration that risk the same deadlock? |
| Comment by Judah Schvimer [ 22/Apr/19 ] |
|
I'm marking this as blocked until |
| Comment by Jack Mulrow [ 11/Apr/19 ] |
None in particular - I only put that down since it came up in the sharded txn / prepare standup that this could (theoretically) also affect step up and I didn't want to forget that. |
| Comment by Judah Schvimer [ 11/Apr/19 ] |
|
Note a similar discussion was held around |
| Comment by Judah Schvimer [ 11/Apr/19 ] |
|
jack.mulrow, step up doesn't kill operations at all, but also doesn't expect arbitrary internal processes to be reading or writing to user data. What internal operations are you thinking of? |