[SERVER-38297] Killing session on a secondary currently applying prepare oplog entry can fassert Created: 28/Nov/18 Updated: 29/Oct/23 Resolved: 14/Mar/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | 4.1.9 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Jack Mulrow | Assignee: | Kaloian Manassiev |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | open_todo_in_code, prepare_errors, todo_in_code | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||
| Sprint: | Sharding 2018-12-17, Sharding 2019-01-28, Sharding 2019-02-11, Sharding 2019-02-25, Sharding 2019-03-11, Sharding 2019-03-25 | ||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||
| Description |
|
When secondaries apply the oplog entry that prepares a transaction, they check out the corresponding MongoD session. If the session is killed while the the entry is being applied, the operation context applying may be interrupted, leading to an fassert here in SyncTail. Example failure:
|
| Comments |
| Comment by Kaloian Manassiev [ 03/May/19 ] |
|
Yes, not sure how I managed to tag it as 4.0.7, but it was a mistake. Thanks for pointing it out! |
| Comment by Maria van Keulen [ 03/May/19 ] |
|
kaloian.manassiev I see the fix version for this ticket is 4.0.7 but the commits were tagged as r4.1.9. Should this fixVersion be changed to 4.1.9? |
| Comment by Kaloian Manassiev [ 14/Mar/19 ] |
|
The only test, which remains disabled now is `multi_statement_transaction_kill_sessions_atomicity_isolation.js`, which is failing for unrelated reasons and I am going to track it under |
| Comment by Githook User [ 13/Mar/19 ] |
|
Author: {'name': 'Kaloian Manassiev', 'username': 'kaloianm', 'email': 'kaloian.manassiev@mongodb.com'}Message: |
| Comment by Githook User [ 13/Mar/19 ] |
|
Author: {'name': 'Kaloian Manassiev', 'username': 'kaloianm', 'email': 'kaloian.manassiev@mongodb.com'}Message: |
| Comment by Kaloian Manassiev [ 07/Mar/19 ] |
|
Actually, Judah already pointed me to the change done under To your question about why allowing killSession on secondaries - why would that be disallowed? There could be secondary reads under a session and these need to be killed somehow. In terms of who should own a session, the answer is that there should always be a single owner of a session at a time on a replica set. The model where secondary application checks-out the session doesn't exactly jive with that model, but since we don't allow transactions on secondaries, this probably doesn't matter right now. |
| Comment by Siyuan Zhou [ 07/Mar/19 ] |
|
The plan makes sense to me. Shutdown came to my mind initially, but I don't think we use InterruptedDueToStepDown to shutdown secondary application. We share the opCtx for all operations on the same writer in a batch. Currently, any command that checks out a session runs in its own batch, so this isn't a problem. But that makes feel we need to disable interruptions for all secondary application. A behavioral question: why do we allow killSession on secondaries? I remember mongos will send killSession to all nodes in a replset, but it's always unclear to me who should own a session. Session is used by transaction and a transaction's life cycle is owned by primary, so it sounds like session should be owned by primary too. Session is also used by secondary read |
| Comment by Kaloian Manassiev [ 07/Mar/19 ] |
|
From looking at the chain of calls that leads to _applyPrepareTransaction, I see that as implemented currently, the applier (SyncTail) never expects to be interrupted (or for any exception to escape while applying oplog). Since making it exception-safe is going to be a lot of work and given the fact that it can't be interrupted anyways, I propose that we fix this by making any sessions checked-out through MongoDOperationContextSessionWithoutRefresh uninterruptible by making the entire applyCommand_inlock call uninterruptible, if called with OplogApplication::Mode != kApplyOpsCmd. Alternatively I could make only this invocation uninterruptible. This will take care of all code paths where MongoDOperationContextSessionWithoutRefresh is used without impacting interruptability of the applyOps command itself. siyuan.zhou, tess.avitabile, what do you think? |
| Comment by Judah Schvimer [ 06/Mar/19 ] |
|
As part of this ticket, please unblacklist tests marked with a TODO against this ticket. |
| Comment by Judah Schvimer [ 03/Dec/18 ] |
|
Open question: what happens currently if you call "killOp" on a secondary oplog applier thread? |
| Comment by Kaloian Manassiev [ 30/Nov/18 ] |
|
Is the problem here that killSession can come concurrently with a session which is currently being operated on by multiApply? The sessions refactor work will not help with this. If the applier code cannot handle interruptions it should mark the the operation context uninterruptible. It is a separate problem if we want to avoid interrupting sessions with prepared transactions on them as described in Let's talk about this on Monday when I will be in the office. |
| Comment by Gregory McKeon (Inactive) [ 29/Nov/18 ] |
|
judah.schvimer, I pinged kaloian.manassiev to talk to you tomorrow about his session refactor work and whether this will help wrap up these sorts of issues. |
| Comment by Judah Schvimer [ 28/Nov/18 ] |
|
This may be solved by some combination of |