[SERVER-40352] Transactions that perform untimestamped reads do not check min visible snapshot for collection Created: 26/Mar/19  Updated: 23/Aug/19  Resolved: 28/May/19

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Tess Avitabile (Inactive) Assignee: Tess Avitabile (Inactive)
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-41107 Create concurrency test that accesses... Closed
is related to SERVER-43018 Formalize contract for safely accessi... Closed
is related to SERVER-38906 Multi-document transactions should no... Closed
Operating System: ALL
Sprint: Repl 2019-04-22, Repl 2019-05-06, Repl 2019-05-20, Repl 2019-06-03
Participants:

 Description   

After SERVER-38906, transactions with local or majority readConcern no longer read at a timestamp. This means that we will not check their read timestamp against the collection's minimum visible snapshot here, so we allow these transactions to read from collections that were created after the transaction was opened. This is illegal in the storage engine. Instead we should record what the most recent visible timestamp is before the transaction, and then use that to check the min visible time when reading a collection.



 Comments   
Comment by Tess Avitabile (Inactive) [ 28/May/19 ]

As we have discussed above, since the concurrency test added in SERVER-41107 does not crash the server, we will not check the min visible snapshot for collections accessed in transactions that use untimestamped reads. SERVER-41107 would have crashed the server, if not for the fix already added in SERVER-36005.

Comment by Tess Avitabile (Inactive) [ 10/May/19 ]

I did a couple quick tests by creating a collection or index after the transaction had started and then using the collection/index in the transaction, and the tests did not crash the server. The next step will be to write an FSM test with transactions and DDL operation on the same collection. If the FSM test does not crash the server, then we will make no code changes.

Comment by Tess Avitabile (Inactive) [ 08/May/19 ]

My concern is that now we have a new visibility level that is neither a WT snapshot nor at/after a timestamp. It would introduce new anomalies, such as after renaming a collection you might not be able to see the collection at all. It neither exists under the old name nor the new one. Similarly, creating an index, even on a collection, would make the collection inaccessible.

It's true that the solution I suggested would be a user experience regression, since in 4.0, if you create a collection with w:1, you can then access the collection in a transaction with local readConcern. With this change, the transaction might abort when you attempt to read from the collection. This would be a surprising abort, but we have other surprising aborts in 4.0 due to two-phase drops.

I feel we're solving one specific problem causing extra complexity and creating more problems. I'd rather do something simpler and more consistent, so having a client record the opTime of its last write and use the maximum of the all-committed point and that time for a transaction following that write.

The above solution has the issue that we would still be doing a timestamped read ahead of all-committed, which is illegal. It also wouldn't allow you to read your w:1 writes with local readConcern if you are using multiple clients.

geert.bosch suggested another solution, which is to wait for the lastApplied to become all-committed before starting a transaction with local readConcern. I would like to try this solution and test the performance impact. If the performance impact is not tolerable, I would like to pursue the solution I suggested, where we abort the transaction if a collection's min visible snapshot is greater than the "read timestamp lower bound".

I also realized that 4.0 may have a visibility problem due to performing timestamped reads ahead of the all-committed. Suppose a transaction starts a timestamped read at time 5. Then a collection is created at time 4. We will allow the transaction to read from the collection, even though the collection does not exist in the transaction's snapshot. Is this a bug?

Comment by Geert Bosch [ 08/May/19 ]

My concern is that now we have a new visibility level that is neither a WT snapshot nor at/after a timestamp. It would introduce new anomalies, such as after renaming a collection you might not be able to see the collection at all. It neither exists under the old name nor the new one. Similarly, creating an index, even on a collection, would make the collection inaccessible.

I feel we're solving one specific problem causing extra complexity and creating more problems. I'd rather do something simpler and more consistent, so having a client record the opTime of its last write and use the maximum of the all-committed point and that time for a transaction following that write.

Comment by Tess Avitabile (Inactive) [ 07/May/19 ]

My proposal is to record the all-committed as the "read timestamp lower bound" before opening the WT transaction. Each time we lock a new collection, we will check if the min visible snapshot for the collection is greater than the "read timestamp lower bound" and, if so, abort the transaction. This should ensure that there have been no catalog changes for that collection since the transaction was opened.

geert.bosch, milkie, can you consider this proposal?

Comment by Judah Schvimer [ 01/Apr/19 ]

geert.bosch, I don't understand why reading before the most recent visible timestamp would provide incorrect behavior. Can you please explain this to me?

Comment by Tess Avitabile (Inactive) [ 01/Apr/19 ]

From geert.bosch:

Traditionally we have relied on locking to avoid this anomaly: you first lock the collection, then you start your storage engine transaction which establishes a snapshot. Our 2-phase locking ensures that no collections can be created while you're using your snapshot. Because now an operation can access more than 1 collection, we don't have this guarantee. Instead we should record what the most recent visible timestamp is before the transaction, and then use that to check the min visible time when reading a collection. The fact that we don't do this is a bug.

Generated at Thu Feb 08 04:54:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.