[SERVER-58184] Checkpoint thread causes assertions when raced with recovering prepared transactions on startup Created: 30/Jun/21 Updated: 29/Oct/23 Resolved: 19/Aug/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 4.4.6 |
| Fix Version/s: | 5.0.3, 4.4.9, 5.1.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Louis Williams | Assignee: | Pavithra Vetriselvan |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Backport Requested: |
v5.0, v4.4, v4.2
|
||||||||||||||||
| Sprint: | Repl 2021-07-12, Repl 2021-07-26, Execution Team 2021-08-09, Execution Team 2021-08-23 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Linked BF Score: | 120 | ||||||||||||||||
| Description |
|
The checkpoint thread reads at the stable timestamp to evaluate the amount of oplog necessary for rollback. If a checkpoint is taken during server startup or after a rollback when we are reconstructing prepared transactions, it may be possible to hit an assertion like this in WiredTiger:
We have only seen this problem reproduce on the code coverage builder, which is extremely slow, and no users have seen this. This has also only been reproduced on 4.4, but it seems like it should affect every version from 4.2 to 5.1. A workaround may be to take a global X lock while reconstructing prepared transactions to conflict with the checkpoint thread. |
| Comments |
| Comment by Vivian Ge (Inactive) [ 06/Oct/21 ] |
|
Updating the fixversion since branching activities occurred yesterday. This ticket will be in rc0 when it’s been triggered. For more active release information, please keep an eye on #server-release. Thank you! |
| Comment by Githook User [ 20/Aug/21 ] |
|
Author: {'name': 'Pavi Vetriselvan', 'email': 'pavithra.vetriselvan@mongodb.com', 'username': 'pvselvan'}Message: (cherry picked from commit 841cff317bf34a320f9b8be24cdf27faf4393cbb) |
| Comment by Githook User [ 20/Aug/21 ] |
|
Author: {'name': 'Pavi Vetriselvan', 'email': 'pavithra.vetriselvan@mongodb.com', 'username': 'pvselvan'}Message: (cherry picked from commit 841cff317bf34a320f9b8be24cdf27faf4393cbb) |
| Comment by Githook User [ 19/Aug/21 ] |
|
Author: {'name': 'Pavi Vetriselvan', 'email': 'pavithra.vetriselvan@mongodb.com', 'username': 'pvselvan'}Message: |
| Comment by Louis Williams [ 30/Jun/21 ] |
|
I just discussed with samy.lanka and we think that moving the callback registration after reconstruction (the call to replCoord->startup) should work. The oplog will not be truncated until that callback is registered. See here. |
| Comment by Daniel Gottlieb (Inactive) [ 30/Jun/21 ] |
|
Ah thanks for the explanation. My head wasn't on right, I thought we were preparing things as part of the oplog application step in startup recovery. It makes sense that we're hitting this because we're re-preparing arbitrarily old transactions.
The only bit that depends on that (I hope) is that we don't truncate any data that's needed for oplog recovery. We should be "conservative" today and not truncate anything until after that function is registered. I'm a little concerned about the MODE_X lock given the FTDC gaps that can create (I'm assuming FTDC is turned on at this point) and there have been a couple complaints lately regarding how little visibility we have into the system at startup and shutdown. |
| Comment by Samyukta Lanka [ 30/Jun/21 ] |
|
My understanding of why we don't hit this during steady state replication is that we shouldn't be preparing a transaction at a timestamp before the timestamp that the checkpoint thread reads at (because the stable timestamp wouldn't have advanced if we had reserved an oplog slot for prepare). We need the out-of-order prepare that happens during reconstructing prepared transactions in recovery. |
| Comment by Louis Williams [ 30/Jun/21 ] |
|
1. My guess is that during steady-state we are using a prepare timestamp that doesn't overlap with the stable timestamp. But I don't know how that's guaranteed. |
| Comment by Daniel Gottlieb (Inactive) [ 30/Jun/21 ] |
|
Two questions:
|
| Comment by Louis Williams [ 30/Jun/21 ] |
|
samy.lanka or daniel.gottlieb, do you have any sense of whether or not my idea is terrible or not? If we consider introducing a different concurrency control mechanism to stop checkpoints while recovering prepared transactions, I think the global X lock looks pretty attractive. Keep in mind we probably need to backport this to 4.2. |