[SERVER-39949] Replace uses of UninterruptibleLockGuard in TransactionParticipant Created: 04/Mar/19 Updated: 06/Dec/22 Resolved: 17/May/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Judah Schvimer | Assignee: | Backlog - Replication Team |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | prepare_errors, prepare_optional | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Assigned Teams: |
Replication
|
||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Description |
|
We've added UninterruptibleLockGuards in certain parts of the TransactionParticipant to prevent interruptibility. UninterruptibleLockGuards don't stop all types of interruptions, just ones in locks. This should be done after |
| Comments |
| Comment by Judah Schvimer [ 07/Mar/19 ] |
|
I moved it per your suggestion. I am concerned that we could add another interrupt point and forget to change this. That said, it would lead to a crash, not data corruption, so we can fix it then. Moved it to "prepare_optional". |
| Comment by Siyuan Zhou [ 07/Mar/19 ] |
|
judah.schvimer, given Dan and Eric's comments above, I think it's correct to use UninterruptibleLockGuard since the only thing that can call checkForInterrupt() for the above two cases is lock acquisition. The case in |
| Comment by Eric Milkie [ 06/Mar/19 ] |
|
Yes; on primaries, we take steps to make sure that assumption is true even on one-node replica sets by setting the stable optime carefully (see ReplicationCoordinatorImpl::_chooseStableOpTimeFromCandidates()). It's always true on secondaries because the logic for doing writes is different there. |
| Comment by Daniel Gottlieb (Inactive) [ 06/Mar/19 ] |
|
A node cannot checkpoint in front of the all_committed (i.e: the stable_timestamp cannot be in front of the all_committed). Making that true in all configurations (single voting primaries, EMRC=off) isn't trivial, but fundamentally, that invariant must hold or else replication recovery wouldn't start from the correct point in the oplog and could miss entries earlier than the "checkpoint timestamp". |
| Comment by Judah Schvimer [ 06/Mar/19 ] |
daniel.gottlieb or milkie, this sounds like a fair assumption, do you agree? We only take checkpoints at stable_timestamps which are always behind the all_committed by definition, so I think we're good. |
| Comment by Siyuan Zhou [ 06/Mar/19 ] |
|
judah.schvimer pointed out the only places that check for interrupts in those two cases are all about lock acquisition, so I think it's at least correct to keep using UninterruptibleLockGuards for now. When we want to make shutdown not blocked by these two oplog writes in the future, we could use runWithoutInterruptionExceptAtGlobalShutdown instead and allow exceptions thrown. They will be caught higher in the stack to return an error status to the client. The data should be consistent anyway as Judah mentioned above. |
| Comment by Siyuan Zhou [ 06/Mar/19 ] |
This will be a problem anyway, with or without shutdown. Do we have an idea to avoid that for steady state when we don't have to pin stable timestamp to the oldest active transaction timestamp? I'm not sure whether the "Oplog Durability" project will save us here because it only guarantees the durable timestamps for both data write and oplog write are the same. If a checkpoint is taken in between, it can still only include the data without the oplog write, unless the stable timestamp is aware of oplog holes. For example, it will be safe if a node can never take a checkpoint beyond all_committed. |
| Comment by Judah Schvimer [ 06/Mar/19 ] |
|
There are two UninterruptibleLockGuards in the TransactionParticipant. Both are uninterruptible because an interruption would be caught in a try...catch and cause a std::terminate(). I don't think it is safe for these places to be interrupted even at shutdown. If it needs to be safe, then we have to replace the std::terminate() in the catch with something that will leave the node in the correct state. Marking the opCtx as opCtx->setIsExecutingShutdown seems like the wrong thing to do here, though. I think we need to use runWithoutInterruptionExceptAtGlobalShutdown() as siyuan.zhou says, but catch interrupt exceptions (which will always be due to shutdown in this case). The two places are:
|
| Comment by Siyuan Zhou [ 05/Mar/19 ] |
|
After |