[SERVER-40870] OpObserverImpl::onTransactionPrepare() reserves OplogSlots rather than using the one reserved by its caller Created: 26/Apr/19 Updated: 13/May/19 Resolved: 09/May/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Siyuan Zhou | Assignee: | Siyuan Zhou |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | bigtxns_packing | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Sprint: | Repl 2019-05-06, Repl 2019-05-20 | ||||||||
| Participants: | |||||||||
| Description |
|
Large transactions will pack multiple operations into a single applyOps. The caller of OpObserver::onTransactionPrepare() won't know how many oplog slots are needed in advance. To avoid the bug in An alternative solution is to reserve more oplog slots than needed, e.g. one for each operations, but the first proposal imposes less restrictions on the caller and simplifies the OpObserver interface. |
| Comments |
| Comment by Siyuan Zhou [ 13/May/19 ] |
|
I believe we are on the same page. In matthew.russotto's |
| Comment by Judah Schvimer [ 13/May/19 ] |
I agree, this would be a problem. This, however, only means we cannot write the final "prepare" oplog entry before calling prepare. We could still write "partialTxn" oplog entries if that were helpful.
How will we give the "partialTxn" entries optimes? Will they be written into the oplog before reserving the prepare oplog slot? |
| Comment by Siyuan Zhou [ 10/May/19 ] |
According to the WT document, that seems to imply update conflicts will be returned. When prepare() fails, I'm wondering whether we can write the prepare entry. The contract of oplog is the corresponding operations should happen as if they are in the oplog order. I'm afraid even if an abort follows the prepare (perhaps after some concurrent ops), it exposes a "prepared" state in the oplog which didn't exist on the primary. That seems a violation of the contract. The "partialTxn" entries don't need to reserve oplog slots. Only the last one for implicit prepare should which determines the prepareTimestamp. |
| Comment by Judah Schvimer [ 10/May/19 ] |
While prepare can fail, I don't think it can get a write conflict.
Can you clarify how this will work in more depth? |
| Comment by Siyuan Zhou [ 09/May/19 ] |
|
Discussed with matthew.russotto, we cannot write oplog entries before calling prepare() on the WUOW since prepare() can fail. Once it fails, we cannot write oplog entry no matter we write an abort immediately after that or not, otherwise, secondaries will pick up the prepare entry and hit the same write conflict. As part of Closing this as "Won't Fix". |
| Comment by Judah Schvimer [ 29/Apr/19 ] |
|
I want to mention a third solution siyuan.zhou and I discussed and rejected: Construct the applyOps entries we will eventually log, reserve their optimes, and then log them with the correct optimes. We then do not need to reserve extra oplog slots. This exposes an undesirable amount of OpObserver behavior, however. |