[SERVER-38224] Calls to `killSessionsAction` will fail to kill all sessions if one is already killed Created: 21/Nov/18 Updated: 29/Oct/23 Resolved: 12/Dec/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Internal Code |
| Affects Version/s: | 4.1.5 |
| Fix Version/s: | 4.1.7 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Kaloian Manassiev | Assignee: | Kaloian Manassiev |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Backwards Compatibility: | Fully Compatible | ||||
| Operating System: | ALL | ||||
| Sprint: | Sharding 2018-12-03, Sharding 2018-12-17 | ||||
| Participants: | |||||
| Linked BF Score: | 67 | ||||
| Description |
|
The sessions kill API was designed so that subsequent attempts to kill a session will throw ConflictingOperationInProgress exception, until checkoutSessionForKill is called. This causes problems for attempts to kill sessions using the killSessionsAction call, because it will fail at the first session, which is double killed. |
| Comments |
| Comment by Githook User [ 12/Dec/18 ] |
|
Author: {'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com', 'username': 'kaloianm'}Message: Also reverts commit e13e069902099d601c6cf64def9fc374082a629e ( |
| Comment by Siyuan Zhou [ 29/Nov/18 ] |
|
Talked with Kal offline, now we use a thread pool to invalidate sessions in |
| Comment by Kaloian Manassiev [ 29/Nov/18 ] |
|
No, the contract will still be "If you call kill on a session, you must call checkOutForKill after that with the provided token". Calling "kill" will increment a counter of how many kills are there outstanding and will expect that many checkOutForKill to be invoked after that.
This would be the equivalent to having a "drain" mode on the SessionCatalog where no new sessions can be created. But I think at shutdown we stop the connections anyways, so I don't think it is necessary, is it? |
| Comment by Siyuan Zhou [ 29/Nov/18 ] |
Just to clarify, option 2 will serialize the two kills, but the first kill may not call checkOutForKill() first, so shutdown may run first and destroy the sessions. This is not the BF case, but would happen, I think. Please correct me if I'm wrong. In contrast, enqueuing a LockerManager lock will guarantee the ordering. Good point on using lock manager locks for session catalog concurrency. I was just wondering if we are reinventing the wheels as we need more and more concurrency control. Another thought: should we have different ways to kill sessions? Should the shutdown kill be separated into two parts - the common kill code path (abortArbitraryTransaction) and the shutdown cleanup, then the cleanup waits after joining all checked out sessions and has a simpler synchronization. |
| Comment by Kaloian Manassiev [ 29/Nov/18 ] |
The reason for this is exactly because the two kill paths can be different - like you mention, there are different kinds of kills currently - for shutdown, for aborting expired transactions, etc. If we attach different decorations on the session, there could be different kill code paths as well. Because of this, currently there is no way to ensure that the two execute, because if the first is running, any subsequent one (the shutdown in this case) will just get a SessionAlreadyKilled exception.
This is exactly what is happening currently.
I think this is orthogonal to this problem - it just changes the type of synchronization to lock manager locks instead of the condition variable on the session catalog and I would like to consider it separately. Just my initial issue with this approach though is that the locks we need to use for the session will have to be of the RESOURCE_MUTEX type and they do not belong anywhere on the lock manager hierarchy, so they are not any more safe than mutex + condition_variable. Also now the SessionCatalog is also used on mongos, which doesn't have the lock manager. I am going to find a time for us to talk about this "in person" this or next week when I am back in the NYC office so we can resolved it quicker. Basically I think the issues we need to agree on are:
|
| Comment by Siyuan Zhou [ 29/Nov/18 ] |
|
If the second killer has to wait, I'm wondering if we can model the session checkout with LockManager locks. For each session, there will be one resource to represent whether the session can be killed or not (SessionKill lock), another resource to represent the Session itself.
Using conditional variable or whatever to notify killers also makes sense. Another more aggressive design is to register a killFunction on session after interrupting the operation context. Then the running thread will execute the killFunction on checkIn(). Both the original killer and the second one use the same future to learn the kill finishes. |
| Comment by Siyuan Zhou [ 29/Nov/18 ] |
|
I'd prefer option 1 since double kill in option 2 overrides the kill reason and implies that different kill code paths are idempotent which may not be true, while the ultimate goal of session refactoring is to reduce the unnecessary concurrency. For option 1, why does the second killer have to wait for the first one? It's fine for direct write and expiration kill. This will be an issue for shutdown since we don't want any running asynchronous task after the shutdown. However, I don't think double kill will fix this problem, since the asynchronous kill task may run very late after the shutdown kill is called. There has to be another way to synchronize them. The original design doc also stated option 1 using exception SessionAlreadyKilled. |
| Comment by Kaloian Manassiev [ 28/Nov/18 ] |
|
When a caller invokes Session::kill, the session is put in a "killed" state, which means it is the responsibility of this caller to perform whatever cleanup tasks they intend to do and leave it in a "pristine" state so to speak (if possible). Double kill would mean that if two threads do it concurrently, they both will be required to execute the cleanup tasks (serially, because of the checkoutSessionForKill serialization). This means the second "killer" might not actually have anything to do. In the mean time, until the second killer runs, other user threads will not be able to use the session, because it is going to be in the "marked for kill" state. Regardless of this though, the double-kill operation is no different than two kills happening one after another, so even if there were an active transaction, it is still fine to kill it. In the case of the linked BF though, what happens is that a session is marked for kill because of drop of config.system.sessions and an asynchronous task is dispatched to finish killing it, but before that task completes, the server is shutdown, which goes through all sessions to kill them, but happens to find that session already killed and throws ConflictingOperationInProgress exception in a place, where it is not expected. |
| Comment by Siyuan Zhou [ 27/Nov/18 ] |
|
With option 2, what does it mean for double kill? Is it possible that the second kill actually kills a newly started transaction after the first one? BTW, who are the two concurrent kills in this case? |
| Comment by Kaloian Manassiev [ 21/Nov/18 ] |
|
There are two ways I can think of to fix this:
I prefer option (2) because it makes the sessions API simpler to use and also because with option (1) (and with how kill works today) there is no way for a second "killer" to wait until the session has actually been killed, whereas with option (2) nobody will be able to use the session until all "killers" have completed. siyuan.zhou, does option (2) sound good to you? |