[SERVER-49949] Reconstructing prepared transactions containing multi-key writes crashes the initial syncing node. Created: 28/Jul/20 Updated: 29/Oct/23 Resolved: 14/Aug/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication, Storage |
| Affects Version/s: | None |
| Fix Version/s: | 4.4.1, 4.7.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Suganthi Mani | Assignee: | Lingzhi Deng |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||
| Backport Requested: |
v4.4, v4.2
|
||||||||||||||||||||||||
| Sprint: | Repl 2020-08-24 | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Linked BF Score: | 20 | ||||||||||||||||||||||||
| Description |
|
On-disk catalog multi-key update for multi-statement transactions happens in a side transaction block. This happens in both 4.4 and master. |
| Comments |
| Comment by Samyukta Lanka [ 19/Jan/21 ] |
|
The original problem can't happen on 4.2, but we're seeing that the prepareTimestamp can be earlier than the oldestTimestamp during startup recovery, which will cause an issue while trying to set the timestamp for the catalog multi-key update. Requesting a backport to 4.2 since the rounding logic will solve this issue as well. |
| Comment by Githook User [ 24/Aug/20 ] |
|
Author: {'name': 'Lingzhi Deng', 'email': 'lingzhi.deng@mongodb.com', 'username': 'ldennis'}Message: (cherry picked from commit 5821e5bc1e2e8c4ed3e791a60a104d57f104caf1) |
| Comment by Suganthi Mani [ 19/Aug/20 ] |
|
tess.avitabile 4.2 is not affected by this issue. |
| Comment by Tess Avitabile (Inactive) [ 19/Aug/20 ] |
|
lingzhi.deng, suganthi.mani, do you know if this affects 4.2? |
| Comment by Githook User [ 14/Aug/20 ] |
|
Author: {'name': 'Lingzhi Deng', 'email': 'lingzhi.deng@mongodb.com', 'username': 'ldennis'}Message: |
| Comment by Lingzhi Deng [ 12/Aug/20 ] |
|
After taking a look at the problem, I think this is probably not a one-liner fix because we could be reconstructing a prepared transaction behind the oldest timestamp during initial sync. And if we use the timestamp of the prepare entry to do the multikey write, then we would be writing behind the oldest timestamp. And I think that was why we needed setRoundUpPreparedTimestamps. So I think we will need some kind of roundup as well for the multikey writes. I tried using the max of (prepare timestamp, oldest timestamp) for the multikey write and it seems to work. daniel.gottlieb, do you think this is a reasonable approach? Another idea is to use the initialDataTimestamp. I think this is also safe as no reader is allowed at a timestamp earlier than that after the initial sync completes. So it doesnt seem to matter which timestamp to use as long as it is <= initialDataTimestamp. |
| Comment by Suganthi Mani [ 03/Aug/20 ] |
|
kelsey.schubert, Just want to bring this ticket to your attention as this bug exists on 4.4. I think as per "4.4 Backports Post GA schedule by team", for replication team, it's scheduled as August 21. Do you see any urgency for this ticket to be backported before August 21? |
| Comment by Suganthi Mani [ 29/Jul/20 ] |
|
We don't hit this problem for startup/rollback recovery because we set recoveryPrepareOpTime to prepareTimestamp when the node's oplog application mode is recovering. As a result, this prepareTimestamp will be used to timestamp catalog multi-key update. We should have a similar solution to tackle the initial sync scenario case. It's actually safe to use prepare timestamp for catalog multi-key update during initial sync & startup/rollback recovery because that timestamp is guaranteed to be less than or equal to the commit timestamp of the transaction and the contract is that multi-key write must occur at a time <= the first write that makes an index multi-key (See this comment). Note: Secondary oplog application doesn't use the side transaction block for catalog multi-key update because the opCtx that performs catalog write is different from the opCtx which apply prepare transaction. Fix should be one-liner change. tess.avitabile Do you know what's the priority of this ticket would be? We consistently crash the initial syncing node when reconstructing prepared transactions containing multi-key writes on 4.4. |