[SERVER-57174] Tightening timestamp order within a single transaction Created: 25/May/21 Updated: 17/Jun/21 Resolved: 17/Jun/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Chenhao Qu | Assignee: | Gregory Noma |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Sprint: | Execution Team 2021-06-14, Execution Team 2021-06-28 | ||||||||
| Participants: | |||||||||
| Description |
|
We hit a bug in The edge case is that mongodb performed multiple updates on the same key within the same transaction. However, these updates are performed with different timestamps and there is an out of order timestamp within them, e.g., U@20 -> U@15 -> U@20 -> U@10 The bug in WiredTiger assumes that there would be no out of order timestamp update within a single transaction. But apparently it is wrong. The WiredTiger team has a fix for this problem but we also suspect there may be something wrong within the code of mongodb. In WiredTiger, we have a check to ensure the commit timestamp of each update is not smaller than the first commit timestamp.
We did an experiment to tight the check to ensure the commit timestamps are strictly in order and our mongodb patch build reports a lot of failures, indicating mongodb is doing out of order timestamp update quite frequently within a single transaction. Can we explore tightening timestamp usage within a single transaction? |
| Comments |
| Comment by Chenhao Qu [ 16/Jun/21 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
gregory.noma Yes, WiredTiger is safe to handle it after | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Gottlieb (Inactive) [ 15/Jun/21 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I believe I was able to repro this with a plain old vectored insert:
The MDB transaction this creates (pardon my charming typos):
We probably also haven't observed this, because this doesn't lead to data corruption (a collection containing a multikey document without the appropriate indexes being marked multikey). With vectored inserts we index records for each individual index in insertion/timestamp order. So the multikey updates are always "appropriately" timestamped. This results in something more akin to "ghost timestamps" where we may roll back less changes, leaving behind some benign multikey state when the accompanying multikey documents were rolled back. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Gottlieb (Inactive) [ 29/May/21 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I found a failure with datafiles to correlate the durable_timestamp s in WT with the oplog. I found these are from regular inserts:
lingzhi.deng, I believe this isn't only problematic for WT, but a rollback bug waiting to happen where multikey state on a migrated collection after rollback may not be consistent with the data. This is assuming tenant migrations survive elections. cc samy.lanka | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Gottlieb (Inactive) [ 29/May/21 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
All of the failures related to that BF are on tenant migrations variants. Here is a bson object pointed to by an update from within WT:
And what I think* is the previous update in the chain (variable is actually named WT_UPDATE.next):
My eyeball diff of those two is this index being flipped from multikey: false -> true:
lingzhi.deng how does tenant migration flip multikey in the catalog? Is it understood that's happening in the same WUOW as other catalog writes with different timestamps associated? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Chenhao Qu [ 29/May/21 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
daniel.gottlieb BF-21248 ( | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Gregory Noma [ 29/May/21 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
In terms of the out-of-order timestamp (on different keys) scenario that Dan mentioned, this is what CollectionImpl::insertDocuments does, which is used in several places including batched user inserts. So it is something that we do quite often. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Gottlieb (Inactive) [ 29/May/21 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
We do have at least one intentional use case (at least back on 4.0, and I expect it still exists) of:
I'm not aware of any circumstance where this will create an individual update chain on a document that will have out of order timestamps, so I don't think the weaker assertion (MDB will reset a WT transaction's commit_timestamp) is sufficient to demonstrate the stronger assertion (MDB generates update chains with the same txnid, but out of order timestamps). The best way forward chenhao.qu is to provide us a patch that crashes the server when we hit exactly this condition. With that a server engineer can reproduce and narrow down what's going on. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Chenhao Qu [ 28/May/21 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
gregory.noma The issue in | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Gregory Noma [ 28/May/21 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
chenhao.qu In the description you mention "multiple updates on the same key within the same transaction," but in your patch build it looks like the check is just for out-of-order timestamps within the same transaction, even if those updates are on different keys. If my understanding is correct, which scenario is the one of concern: out-of-order timestamps for the same key within a transaction, or out-of-order timestamps within a transaction regardless of the keys? Or let me know if I'm misunderstanding something. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Louis Williams [ 25/May/21 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
This appears to be a problem with time-series inserts based on the stack trace in this comment. |