[SERVER-40167] Index key removal should not encounter prepare conflicts on unrelated keys Created: 15/Mar/19 Updated: 06/Dec/22 Resolved: 12/Aug/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Querying, Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Louis Williams | Assignee: | Backlog - Storage Execution Team |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | newgrad, txn_storage | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
|||||||||||||||||||||||||||||||||||||||||
| Assigned Teams: |
Storage Execution
|
|||||||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | |||||||||||||||||||||||||||||||||||||||||
| Steps To Reproduce: |
|
|||||||||||||||||||||||||||||||||||||||||
| Sprint: | Storage NYC 2019-05-06, Execution Team 2021-02-08, Execution Team 2021-02-22 | |||||||||||||||||||||||||||||||||||||||||
| Participants: | ||||||||||||||||||||||||||||||||||||||||||
| Linked BF Score: | 8 | |||||||||||||||||||||||||||||||||||||||||
| Description |
|
After an index key is deleted, the cursor on the index is re-established by calling restoreState(). This restore uses a search_near to reposition the cursor at the original position. Since this key has now been deleted, the search_near positions the cursor at the logically adjacent value. If this adjacent key is involved in a prepared transaction, the cursor restoration encounters a prepare conflict, when it would not have otherwise until this point. Instead of calling restoreState() after deleting each key, which effectively repositions the cursor on a deleted entry every single time, it may make sense to instead directly position the cursor on the next key for deletion. Note: This is only true once Example stacktrace:
|
| Comments |
| Comment by Githook User [ 08/Sep/21 ] |
|
Author: {'name': 'Louis Williams', 'email': 'louis.williams@mongodb.com', 'username': 'louiswilliams'}Message: |
| Comment by Louis Williams [ 12/Aug/21 ] |
|
We have decided that we will not fix this problem. Prepare conflicts are meant to resolve as quickly as possible, so any visible stalls caused by this behavior should not last for long periods of time. |
| Comment by Louis Williams [ 27/Jun/19 ] |
|
I recently filed |
| Comment by Louis Williams [ 07/May/19 ] |
|
Flagging for scheduling. The work required would be to not restore cursors in the delete code path. Additionally, this issue is hard to observe in the wild, and likely would be hard to distinguish from expected prepare conflicts. |
| Comment by Judah Schvimer [ 01/May/19 ] |
|
Removing from the prepare epic so this can be prioritized by the storage team separately from the rest of the project. |
| Comment by Louis Williams [ 28/Mar/19 ] |
|
The storage issue is that Cursor::restore() on a missing record may accidentally land on an adjacent record which is part of a prepared transaction. This would ideally be fixed by The query involvement is that the DeleteStage calls saveState(), deletes a document, then restoreState(). This closes and repositions the cursor on a deleted record, which may introduce this undesirable behavior. The goal of this ticket would be to not restore Index/CollectionScan cursors after deleting documents because they always reposition on deleted records. Even if I'll assign this back to storage so we can investigate how this would actually work. |
| Comment by Craig Homa [ 28/Mar/19 ] |
|
Hey louis.williams, the Query team is wondering why this is in their bucket. Should this sit with sharding instead? |
| Comment by Louis Williams [ 18/Mar/19 ] |
|
I think the reason is simply that each passthrough test operates on its own session and its own collection. We’ll never see this issue there because it only happens when different sessions are operating on the same collection. Even for FSM tests, prepared transactions always end up getting committed or aborted. So even if an operation encounters a prepare conflict, the conflicting prepared transactions will eventually finish, and the conflict will resolve. This issue seems isolated only to targeted replica set tests where prepared transactions are held open and another operation is attempted. |
| Comment by Judah Schvimer [ 18/Mar/19 ] |
|
louis.williams, why do you not expect this to happen in jscore passthroughs that wrap crud ops in transactions or concurrency suites that use transactions? |
| Comment by Louis Williams [ 18/Mar/19 ] |
|
judah.schvimer I wouldn't say this is 'blocking' passthrough testing. If this failure were to occur outside the suites it's been observed, it may be tricky to diagnose. Outside of testing, my concern is for the difficulty of diagnosis if a user happens to encounter this. It would be very unexpected, and in general, is not correct behavior for MongoDB. |
| Comment by Judah Schvimer [ 16/Mar/19 ] |
|
louis.williams, am I correct that this is effectively blocking a lot of our passthrough testing for sharded transactions because we would get hard to diagnose BFs as a result? If so, I will mark this as a "Blocker" to escalate its priority. |
| Comment by Louis Williams [ 15/Mar/19 ] |
|
I filed |