-
Type: Bug
-
Resolution: Fixed
-
Priority: Critical - P2
-
Affects Version/s: None
-
Component/s: Replication, Storage, WiredTiger
-
None
-
Fully Compatible
-
ALL
-
v4.2
-
Execution Team 2019-07-15, Storage Engines 2019-07-01, Execution Team 2019-07-29
-
12
This explanation is incorrect when prepared transactions are getting committed.
The all_committed is the (timestamp of the earliest uncommitted transaction that has a commit timestamp) - 1. For prepared transactions, until commit time the transaction isn't included in the all_committed because it is not timestamped. At commit time, the all_committed can briefly jump back to the commitTimestamp-1 between when we set the commitTimestamp on the transaction and when we actually commit the transaction.
This invalidates the assumption that the all_committed is always "in the same term" as the commitPoint on a primary.
This also invalidates any assumptions we've made about the all_committed always moving forward.
There are 3 options I can think of:
- Change the semantic meaning of all_committed to be all_durable and use the durable timestamp rather than the commit timestamp to calculate it. This is in line with the idea of all_committed really being used to determine when oplog holes are open. michael.cahill thinks this isn't too hard and is reasonable if needed, though it does require more thought since it's a significant API change.
- Add a mechanism for committing a transaction with a commitTimestamp such that it is never counted in calculating all_committed and use it for any storage-transactions (including prepared mongodb transactions) that timestamp their transactions only right before commit time.
- Try to work around the current all_committed behavior in stableTimestamp calculation. This doesn't fix the problem of all_committed moving backwards, if in fact that's a problem in other places where we just haven't seen it.
- related to
-
WT-4900 Implement all_durable timestamp
- Closed