[SERVER-49781] Setting lastApplied before startup recovery has finished causes race with reconstructing prepared transactions Created: 21/Jul/20  Updated: 24/Jun/21  Resolved: 15/Sep/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 4.4.2

Type: Bug Priority: Major - P3
Reporter: Samyukta Lanka Assignee: Louis Williams
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Duplicate
duplicates SERVER-48452 Internal readers should default to re... Closed
is duplicated by WT-6626 WT_SESSION.prepare_transaction: __txn... Closed
Related
is related to SERVER-44529 Re-acquiring locks after a yield and ... Closed
is related to SERVER-46721 Step up may cause reads at PIT with h... Closed
Operating System: ALL
Backport Requested:
v4.7, v4.4
Sprint: Execution Team 2020-09-07, Execution Team 2020-09-21
Participants:
Linked BF Score: 64

 Description   

When preparing a transaction, WT requires that the prepare timestamp is greater than the latest active read timestamp. To allow for this, we often expect readers to do untimestamped reads during startup to avoid conflicting with prepared transactions (which are only reconstructed at the end of recovery). One way we achieved this in the past was by only setting lastApplied at the end of recovery, so that anything that reads with a kLastApplied read source does an untimestamped read instead.

However, we now set lastApplied on the snapshot manager before finishing recovery (and specifically before reconstructing prepared transactions). We do this to prevent a race with kNoOverlap that could cause a reader to see writes that shouldn't be visible. Ultimately, this means that operations reading with a kLastApplied read source during startup can race with reconstructing prepared transactions.



 Comments   
Comment by Louis Williams [ 11/Sep/20 ]

This should be alleviated by SERVER-48452, but I'm going to keep this ticket open to track my progress on the 4.4 backport.

Comment by Daniel Gottlieb (Inactive) [ 23/Jul/20 ]

User requests can run during the reconstructing prepared transactions phase of startup/rollback recovery?

Comment by Louis Williams [ 23/Jul/20 ]

It seems like any user request that reads at kLastApplied could also encounter this problem. The free monitoring service just happened to hit this in our tests. Is that right?

Would it make sense to take both approaches? That is:

  • Reject kLastApplied and kNoOverlap reads when neither lastApplied nor all_durable is available
  • Default internal readers to use kNoTimestamp instead of kLastApplied.
Comment by Daniel Gottlieb (Inactive) [ 22/Jul/20 ]

My apologies, I'm familiar with the invariant. I was trying to get more information on what the reader was, because it was omitted I had just assumed the reader was related to reconstructing the prepared updates.

To me, it's a bug when any internal reader can have its read source changed due to being in an unset state. My grepping of the free_mon directory doesn't show any setting of an explicit read source. Can you confirm that the recovery unit on the problematic reader was simply unset (which would opt into last applied behavior if a timestamp exists).

Comment by Samyukta Lanka [ 22/Jul/20 ]

Can you confirm that the read source for reconstructing prepared transactions is explicitly using kLastApplied

Sorry for any confusion, the issue isn't that the read source for reconstructing prepared transactions is using kLastApplied, but that other readers are able to read with kLastApplied even though reconstructing prepared transactions hasn't happened yet.

Let's say that the last entry in the oplog is the prepare entry. Any reader (such as the Free Monitoring Processor) using kLastApplied will be reading at the prepareTimestamp. But since we haven't reconstructed the transaction yet, when we go to prepare it in WT, we'll see that there is an active reader at the same timestamp that we want to prepare at, which violates our contract for the prepareTimestamp.

Comment by Daniel Gottlieb (Inactive) [ 22/Jul/20 ]

Edit

Ultimately, this means that operations reading with a kLastApplied read source during startup can race with reconstructing prepared transactions.

I misread this the first time. Can you confirm that the read source for reconstructing prepared transactions is explicitly using kLastApplied, as opposed to just accidentally picking up the last applied timestamp because the readsource is simply unset? If it is explicitly being set, is there a reason kLastApplied was chosen instead of kNoTimestamp?

Original comment
Setting last applied at startup was also one of the (multiple necessary) changes that resulted inĀ SERVER-48934. I'm curious if we should audit other startupĀ readers to see what is now reading at a last applied value instead of the previous behavior that would end up reading without a timestamp. I believe we'd be able to investigate that programmatically.

SERVER-48452 (internal readers should default to using kNoTimestamp) sounds like it would also avoid this?

Comment by Eric Milkie [ 22/Jul/20 ]

If there is nothing in the oplog at the end of startup recovery, I believe that corresponds to a system where timestamped writes are not possible (e.g. a replica set node prior to initiate command). Therefore, we don't need to do anything for lastApplied in this case, although it wouldn't hurt to set it to something like initialDataTimestamp.

Comment by Samyukta Lanka [ 21/Jul/20 ]

One possible solution that louis.williams came up with was to make kNoOverlap reads return SnapshotUnavailable until lastApplied is set. However, we cannot guarantee that lastApplied is set if there is nothing in the oplog at the end of startup recovery. Perhaps we could set lastApplied to the initialDataTimestamp in this case?

Generated at Thu Feb 08 05:20:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.