[SERVER-41861] stableTimestamp calculation makes incorrect assumptions about all_committed Created: 21/Jun/19 Updated: 29/Oct/23 Resolved: 26/Jul/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication, Storage, WiredTiger |
| Affects Version/s: | None |
| Fix Version/s: | 4.2.0-rc5, 4.3.1 |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Judah Schvimer | Assignee: | Gregory Wlodarek |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Backport Requested: |
v4.2
|
||||||||||||||||
| Sprint: | Execution Team 2019-07-15, Storage Engines 2019-07-01, Execution Team 2019-07-29 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Linked BF Score: | 12 | ||||||||||||||||
| Description |
|
This explanation is incorrect when prepared transactions are getting committed. The all_committed is the (timestamp of the earliest uncommitted transaction that has a commit timestamp) - 1. For prepared transactions, until commit time the transaction isn't included in the all_committed because it is not timestamped. At commit time, the all_committed can briefly jump back to the commitTimestamp-1 between when we set the commitTimestamp on the transaction and when we actually commit the transaction. This invalidates the assumption that the all_committed is always "in the same term" as the commitPoint on a primary. This also invalidates any assumptions we've made about the all_committed always moving forward. There are 3 options I can think of:
|
| Comments |
| Comment by Githook User [ 26/Jul/19 ] |
|
Author: {'name': 'Gregory Wlodarek', 'username': 'GWlodarek', 'email': 'gregory.wlodarek@mongodb.com'}Message: (cherry picked from commit 65f608a4b17440d75ece209e209401e1d74ad638) |
| Comment by Githook User [ 26/Jul/19 ] |
|
Author: {'name': 'Gregory Wlodarek', 'username': 'GWlodarek', 'email': 'gregory.wlodarek@mongodb.com'}Message: (cherry picked from commit e6b6a2231ae7f05c3c0f6fc1a0ce111792436e58) |
| Comment by Githook User [ 26/Jul/19 ] |
|
Author: {'name': 'Gregory Wlodarek', 'username': 'GWlodarek', 'email': 'gregory.wlodarek@mongodb.com'}Message: (cherry picked from commit 25d5f6a0b01f261e633587013e4ab8116ea2930a) |
| Comment by Githook User [ 26/Jul/19 ] |
|
Author: {'name': 'Gregory Wlodarek', 'username': 'GWlodarek', 'email': 'gregory.wlodarek@mongodb.com'}Message: |
| Comment by Githook User [ 26/Jul/19 ] |
|
Author: {'name': 'Gregory Wlodarek', 'username': 'GWlodarek', 'email': 'gregory.wlodarek@mongodb.com'}Message: |
| Comment by Githook User [ 26/Jul/19 ] |
|
Author: {'name': 'Gregory Wlodarek', 'email': 'gregory.wlodarek@mongodb.com', 'username': 'GWlodarek'}Message: |
| Comment by Jocelyn del Prado [ 19/Jul/19 ] |
|
milkie, the storage engines work here is done (see |
| Comment by Michael Cahill (Inactive) [ 18/Jul/19 ] |
|
Repeating an offline conversation: WiredTigerRecordStore::oplogDiskLocRegister needs to be changed to set the durable timestamp for prepared transactions (as opposed to the commit timestamp it is currently setting) for this functionality requested by the Replication team to be fully integrated. |
| Comment by Alex Cameron (Inactive) [ 18/Jul/19 ] |
|
kelsey.schubert milkie There will be a WT drop happening shortly. Provided that there's no fallout from that, I'll assign this ticket back to Replication Backlog. |
| Comment by Eric Milkie [ 17/Jul/19 ] |
|
There is a bit of work in the MongoDB code to start consuming the all_durable value out of WT. |
| Comment by Kelsey Schubert [ 17/Jul/19 ] |
|
Is there work that needs to be done under this ticket or can it be closed as a dup of |
| Comment by Alexander Gorrod [ 27/Jun/19 ] |
|
If we make a WiredTiger change to address this, it's possible that we'll need to stage delivery of it, i.e: add something new while retaining the old behavior, then removing the old behavior. Otherwise we'll need to carefully stage delivery with MongoDB changes. |
| Comment by Judah Schvimer [ 21/Jun/19 ] |
|
I'll assign this to the storage engines team for investigation. |
| Comment by Eric Milkie [ 21/Jun/19 ] |
|
I'd like to further explore option number 1, because it's the most elegant solution, as long as there aren't other issues with it that we haven't thought of yet. |
| Comment by Judah Schvimer [ 21/Jun/19 ] |
|
I don't think it would be possible to work around this just in replication. The stableTimestamp needs to be behind the all_committed so any contract where the all_committed can move backwards would make that impossible to guarantee. What do people think of the two storage solutions (1 and 2 above)? |