[SERVER-40176] Cursor seekExact should not use WT_CURSOR:search_near to avoid unintentional prepare conflicts Created: 15/Mar/19  Updated: 26/Oct/23  Resolved: 17/Jun/19

Status: Closed
Project: Core Server
Component/s: Replication, Storage
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Louis Williams Assignee: Louis Williams
Resolution: Won't Fix Votes: 0
Labels: txn_storage
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-40177 Enforce prepare conflicts on secondaries Closed
is depended on by WT-4580 Abort transactions that perform updat... Closed
Related
related to SERVER-40421 Add failpoint to skip doing retries o... Closed
related to SERVER-82457 Test cases in txn_commit_optimization... Backlog
is related to SERVER-40167 Index key removal should not encounte... Closed
Operating System: ALL
Sprint: Storage NYC 2019-04-08, Storage NYC 2019-05-06, Storage NYC 2019-05-20
Participants:
Linked BF Score: 5

 Description   

As described in SERVER-40167, seeking using search_near can cause operations to encounter unintended prepare conflicts when the queried key is not found.



 Comments   
Comment by Louis Williams [ 07/May/19 ]

We decided we will not be making any changes to the storage API, and instead perform inserts as inserts, instead of upserts, during steady-state replication for SERVER-40177.

Comment by Judah Schvimer [ 01/May/19 ]

Removing from the prepare epic so this can be prioritized by the storage team separately from the rest of the project.

Comment by Alexander Gorrod [ 30/Apr/19 ]

louis.williams I've booked a meeting to talk about this work.

Comment by Louis Williams [ 24/Apr/19 ]

michael.cahill thanks for the summary. This is what I understand are the changes required. I don't think we'll be able to avoid unintentional conflicts on non-unique indexes for any case, but I'm not sure of a solution where that's avoidable.

  • SERVER-40167: cursor restore after remove
    • Work required: return the positioned key when searh_near hits a prepare conflict. If the key returned matches the key requested, then the caller will observe the prepare conflict. If the returned key doesn't not match, allow the caller to use next() to potentially hit any adjacent records that may be prepared. This is prone to prepare conflicts for keys on non-unique indexes. 
  • SERVER-40176 (this ticket)
    • SERVER-40177 is just a symptom of this issue, because updates on secondaries use seekExact via upserts.
    • Work required: use search_prefix to only return records matching a prefix. If the first match is prepared, then the prepare conflict is observed. If there is no match, WT_NOTFOUND is treated as EOF by MongoDB, which is fine because seekExact does not expect to get keys that do not match. This is also prone to hit prepare conflicts for non-unique indexes. 

If this sounds right and appropriate, I can open WT tickets for both changes.

Comment by Michael Cahill (Inactive) [ 15/Apr/19 ]

Here is my attempt to summarize my understanding of the various issues and proposed WT API changes:

  SERVER-40167 SERVER-40176 SERVER-40177
  cursor restore after remove seekExact uses search_near, can get spurious prepare conflicts prepare conflicts in secondary oplog application
ignore_prepare maybe okay but we're trying to get away from ignoring prepared updates not clear it is always safe to ignore conflicts, definitely safe to ignore all conflicts during update transactions should be safe but ignoring conflicts could mask bugs
search prefix not clear this helps? finds the first match: what if it is prepared? if there is no match, returns WT_NOTFOUND, how is that distinguished from EOF? should fix this: secondaries only search _id index
get key after prepare conflict could make the restore successful and only block if cursor->next is called could check the key to decide if the prepare conflict is genuine. Then what – e.g., if the returned key is less than the search key, need to call cursor->next as for SERVER-40176

I'm concerned that search_prefix doesn't help in all cases.  In particular, if documents share an ordinary, non-unique index key, then a prepared transaction in an unrelated document that happens to share the same index key value could still cause blocking when accessing an adjacent document.

If we can agree on a complete set of changes that mean prepare conflicts only cause blocking when accessing the documents involved in a prepared transaction, then we can map out and schedule the WT changes.

If we're saying that prepared conflicts with logically unrelated documents are unavoidable in some cases, maybe the search_prefix approach is sufficient, but then I'd like clearer answers to the questions above.

Comment by Louis Williams [ 02/Apr/19 ]

michael.cahill thanks for talking through this.

would it be sufficient to simply ignore all updates that are part of prepared transactions and allow writes? Or are there cases where the conflict is real and needs to be resolved to provide correct semantics.

I don't think in any case it would be 100% safe to allow writes when ignoring updates that are part of prepared transactions. To be honest, this is hard to reason about. In the case of key removal (SERVER-40167), the key selected for removal could be part of a prepared transaction, and that should encounter a legitimate prepare conflict. It would be incorrect to ignore that.

what would the proposed search_prefix method do if there were entries that match the prefix, but some of them were part of prepared transactions?

In this case, MongoDB would have to traverse through a small set of keys, but search_prefix should still encounter a WT_PREPARE_CONFLICT. The motivation behind this change is to not allow cursors to return records that it wouldn't find useful, especially if that record is prepared. As long as the only keys a cursor iterates over match the provided prefix, this seems correct to me.

Another case we would have to think about is what if there are matching entries that were removed as part of a prepared transaction (these would be visible with ignore_prepared=true)

In this case, again, I think it's fine to return a WT_PREPARE_CONFLICT. Non-unique indexes can return multiple documents for a single key, and if that requires traversing a small set, then it should not skip prepared updates. I also think this case would effectively maintain the current behavior of search_near.

Again imagine the case where there are multiple entries that match the prefix and the first one happens to be prepared: that could return WT_PREPARE_CONFLICT and set the key in the cursor, allowing MongoDB to check that the entry matches. But then what – do we need to be able to move to the next record? What if that is prepared? etc.

This raises a good question about how MongoDB uses cursors, especially for document removal. Even if cursor restoration used search_prefix after deleting a key, I would expect the return value to be WT_NOTFOUND. Then what? How would MongoDB differentiate an EOF from a repositioned cursor that landed on a recently deleted key? Maybe it's possible that if we decide to return the key after hitting a WT_PREPARE_CONFLICT, the _endPosition from the Index scan can be used to consider whether or not the scan is at EOF. Also, I don't imagine this being an issue for the upsert case on secondaries.

Comment by Louis Williams [ 18/Mar/19 ]

Quoted from SERVER-40167

I think the reason  [why this will not block testing] is simply that each passthrough test operates on its own session and its own collection. We’ll never see this issue there because it only happens when different sessions are operating on the same collection. Even for FSM tests, prepared transactions always end up getting committed or aborted. So even if an operation encounters a prepare conflict, the conflicting prepared transactions will eventually finish, and the conflict will resolve. This issue seems isolated only to targeted replica set tests where prepared transactions are held open and another operation is attempted.

Comment by Judah Schvimer [ 18/Mar/19 ]

louis.williams, IIUC, there will be BFs but we cannot predict how many? If so I lean towards marking as a blocker to escalate the priority.

Comment by Louis Williams [ 18/Mar/19 ]

I think it is important to do this for correctness because it would be especially difficult to diagnose from a user perspective. I'm not sure if this is blocking passthrough testing, however, because it's hard to predict the magnitude of failures that might occur.

Comment by Judah Schvimer [ 16/Mar/19 ]

louis.williams, am I correct that this is effectively blocking a lot of our passthrough testing for sharded transactions because we would get hard to diagnose BFs as a result? If so, I will mark this as a "Blocker" to escalate its priority.

Generated at Thu Feb 08 04:54:14 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.