[SERVER-40176] Cursor seekExact should not use WT_CURSOR:search_near to avoid unintentional prepare conflicts Created: 15/Mar/19 Updated: 26/Oct/23 Resolved: 17/Jun/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication, Storage |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Louis Williams | Assignee: | Louis Williams |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | txn_storage | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||
| Sprint: | Storage NYC 2019-04-08, Storage NYC 2019-05-06, Storage NYC 2019-05-20 | ||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||
| Linked BF Score: | 5 | ||||||||||||||||||||||||||||
| Description |
|
As described in |
| Comments |
| Comment by Louis Williams [ 07/May/19 ] | ||||||||||||||||||||
|
We decided we will not be making any changes to the storage API, and instead perform inserts as inserts, instead of upserts, during steady-state replication for | ||||||||||||||||||||
| Comment by Judah Schvimer [ 01/May/19 ] | ||||||||||||||||||||
|
Removing from the prepare epic so this can be prioritized by the storage team separately from the rest of the project. | ||||||||||||||||||||
| Comment by Alexander Gorrod [ 30/Apr/19 ] | ||||||||||||||||||||
|
louis.williams I've booked a meeting to talk about this work. | ||||||||||||||||||||
| Comment by Louis Williams [ 24/Apr/19 ] | ||||||||||||||||||||
|
michael.cahill thanks for the summary. This is what I understand are the changes required. I don't think we'll be able to avoid unintentional conflicts on non-unique indexes for any case, but I'm not sure of a solution where that's avoidable.
If this sounds right and appropriate, I can open WT tickets for both changes. | ||||||||||||||||||||
| Comment by Michael Cahill (Inactive) [ 15/Apr/19 ] | ||||||||||||||||||||
|
Here is my attempt to summarize my understanding of the various issues and proposed WT API changes:
I'm concerned that search_prefix doesn't help in all cases. In particular, if documents share an ordinary, non-unique index key, then a prepared transaction in an unrelated document that happens to share the same index key value could still cause blocking when accessing an adjacent document. If we can agree on a complete set of changes that mean prepare conflicts only cause blocking when accessing the documents involved in a prepared transaction, then we can map out and schedule the WT changes. If we're saying that prepared conflicts with logically unrelated documents are unavoidable in some cases, maybe the search_prefix approach is sufficient, but then I'd like clearer answers to the questions above. | ||||||||||||||||||||
| Comment by Louis Williams [ 02/Apr/19 ] | ||||||||||||||||||||
|
michael.cahill thanks for talking through this.
I don't think in any case it would be 100% safe to allow writes when ignoring updates that are part of prepared transactions. To be honest, this is hard to reason about. In the case of key removal (
In this case, MongoDB would have to traverse through a small set of keys, but search_prefix should still encounter a WT_PREPARE_CONFLICT. The motivation behind this change is to not allow cursors to return records that it wouldn't find useful, especially if that record is prepared. As long as the only keys a cursor iterates over match the provided prefix, this seems correct to me.
In this case, again, I think it's fine to return a WT_PREPARE_CONFLICT. Non-unique indexes can return multiple documents for a single key, and if that requires traversing a small set, then it should not skip prepared updates. I also think this case would effectively maintain the current behavior of search_near.
This raises a good question about how MongoDB uses cursors, especially for document removal. Even if cursor restoration used search_prefix after deleting a key, I would expect the return value to be WT_NOTFOUND. Then what? How would MongoDB differentiate an EOF from a repositioned cursor that landed on a recently deleted key? Maybe it's possible that if we decide to return the key after hitting a WT_PREPARE_CONFLICT, the _endPosition from the Index scan can be used to consider whether or not the scan is at EOF. Also, I don't imagine this being an issue for the upsert case on secondaries. | ||||||||||||||||||||
| Comment by Louis Williams [ 18/Mar/19 ] | ||||||||||||||||||||
|
Quoted from
| ||||||||||||||||||||
| Comment by Judah Schvimer [ 18/Mar/19 ] | ||||||||||||||||||||
|
louis.williams, IIUC, there will be BFs but we cannot predict how many? If so I lean towards marking as a blocker to escalate the priority. | ||||||||||||||||||||
| Comment by Louis Williams [ 18/Mar/19 ] | ||||||||||||||||||||
|
I think it is important to do this for correctness because it would be especially difficult to diagnose from a user perspective. I'm not sure if this is blocking passthrough testing, however, because it's hard to predict the magnitude of failures that might occur. | ||||||||||||||||||||
| Comment by Judah Schvimer [ 16/Mar/19 ] | ||||||||||||||||||||
|
louis.williams, am I correct that this is effectively blocking a lot of our passthrough testing for sharded transactions because we would get hard to diagnose BFs as a result? If so, I will mark this as a "Blocker" to escalate its priority. |