[SERVER-64125] MDB server sets the commit/durable timestamps equal to the stable timestamp Created: 02/Mar/22 Updated: 29/Oct/23 Resolved: 01/Apr/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 6.0.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Keith Bostic (Inactive) | Assignee: | Jordi Olivares Provencio |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||
| Sprint: | Execution Team 2022-03-21, Execution Team 2022-04-04 | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Linked BF Score: | 35 | ||||||||||||||||||||||||
| Description |
|
Generally, it is not correct to allow a transaction to commit with a commit timestamp at the stable timestamp, or a prepared transaction to commit with a durable timestamp at the stable timestamp. This can confuse checkpoint as to whether the newly committed transaction should be included in the checkpoint and can potentially lead to data inconsistencies. With the merge of We do not know of any actual MDB Server problems in this area, but it would be good to fix any place where this happens and change the WiredTiger standalone behavior apply to all builds, to avoid the introduction of problems in the future. |
| Comments |
| Comment by Githook User [ 01/Apr/22 ] |
|
Author: {'name': 'Jordi Olivares Provencio', 'email': 'jordi.olivares-provencio@mongodb.com', 'username': 'jordiolivares'}Message: |
| Comment by Daniel Gottlieb (Inactive) [ 14/Mar/22 ] |
|
Thanks for running that experiment again Keith. I took a look. I only counted 3 unique failures, so I'll list them here:
A disclaimer, everything about how multikey is tracked is considered a wart. So while describing the behavior, please don't believe there's some virtue to how we do things. We got here because of constraints dating back to WT's adoption and haven't made the effort to schema our way out of this problem as it's a tech debt problem that carries on-disk format changes (i.e: upgrade/downgrade) and large risk. Multikey is an index state when (most simply) an index contains multiple keys to the same MDB document/record. When an index is not* multikey, query can assume any document returned from an index is unique. But if an index is multikey, query must maintain a set of returned records to avoid double-returning the same document. When a client inserts the first document that "flips multikey" we do that write on the _mdb_catalog document. Unfortunately that document tracks a bunch of detail, not limited to the specific index. Meaning preparing a change to that document can cause contention for things that are logically unrelated. Thus for things like flipping multikey in a prepared transaction, we make an effort to change that document in a separate transaction and commit it prior to the data write that it reflects. The system remains correct so long as multikey is true at or before there are multikey documents. When reconstructing prepared transactions, we can find that multikey needs to be set (I'm not entirely sure why this happens for txns in a prepare state at the stable timestamp – understanding that may provide us with more outs of this situation). Right now we when reconstructing we can write at the stable timestamp. I think it's just as safe here to write at stable + 1. That said, I do have a non-sequitur concern about these recovery writes where the multikey write has a timestamp larger than the prepare timestamp. It means that after the transaction commits, the transaction could choose a visible (commit_)timestamp smaller than the multikey write. This isn't strictly a problem today where multikey is read from a modern in-memory state (multikey doesn't "go backwards"). But this would be wrong in a versioned catalog world where multikey state is derived from the reader's snapshot. |
| Comment by Keith Bostic (Inactive) [ 11/Mar/22 ] |
|
Now that cc: geert.bosch, louis.williams, daniel.gottlieb, alexander.gorrod |
| Comment by Keith Bostic (Inactive) [ 02/Mar/22 ] |