[SERVER-33553] OP_KILL_CURSORS fails on mongos: Unable to check out cursor for killCursor Created: 28/Feb/18 Updated: 29/Oct/23 Resolved: 04/Jun/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 3.6.3 |
| Fix Version/s: | 3.6.6 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Shane Harvey | Assignee: | Ian Boros |
| Resolution: | Fixed | Votes: | 2 |
| Labels: | neweng | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Steps To Reproduce: |
|
||||||||
| Sprint: | Query 2018-06-04, Query 2018-06-18 | ||||||||
| Participants: | |||||||||
| Case: | (copied to CRM) | ||||||||
| Description |
|
Sending OP_KILL_CURSORS to 3.6.3 Mongos fails when auth is enabled with: "Unable to check out cursor for killCursor. Namespace: 'pymongo_test.test', cursor id: 3368188369600609201." Here is the mongos log including the initial find command, OP_KILL_CURSORS request, and a final getMore that succeeds:
I can reproduce this on 3.6.3 but not on mongodb latest version: |
| Comments |
| Comment by Githook User [ 07/Jun/18 ] |
|
Author: {'name': 'Ian Boros', 'email': 'ian.boros@10gen.com'}Message: |
| Comment by Githook User [ 04/Jun/18 ] |
|
Author: {'name': 'Ian Boros', 'email': 'ian.boros@10gen.com'}Message: |
| Comment by Charlie Swanson [ 10/May/18 ] |
|
Bumping this back into "Needs Triage". |
| Comment by Charlie Swanson [ 23/Mar/18 ] |
|
We believe that the fix outlined by Ian above is the correct fix, but we are not scheduling this immediately. |
| Comment by Ian Whalen (Inactive) [ 16/Mar/18 ] |
|
ian.boros can you please talk with behackett about mixing sessions and OP_KILL_CURSORS and we'll come back to this in a future triage? |
| Comment by David Storch [ 07/Mar/18 ] |
|
It's unlikely that we can backport the change from 3.7, since it was part of a larger effort to improve how sharded queries are killed. We would likely pursue a custom for the 3.6 branch under this ticket. |
| Comment by Bernie Hackett [ 02/Mar/18 ] |
|
I like that theory. Can we backport the 3.7 change? |
| Comment by Ian Boros [ 01/Mar/18 ] |
|
Here's a theory: In strategy.cpp (the codepath for OP_KILL_CURSORS) we try to check out the cursor: There, checkOutCursor is called with the default checkSessionAuth value of kCheckSession: That code (in strategy.cpp) is relatively new on 3.6 (check the blame). The commit date is January 10 (note the author date is pretty old). I recently made a change on master so that we don't attempt to check out a cursor before killing it when running OP_KILL_CURSOR. This might explain why this is a problem only in 3.6.3 and not 3.6.0 or master, and also why it only happens with OP_KILL_CURSOR. The code path for the killcursor "command" passes kNoCheckSession to checkOutCursor. My guess is maybe we need to update the call in strategy.cpp to do this as well? I can't reproduce this using the shell, and I think the reason is what Dave pointed out about sessions not being supported with legacy readMode. |
| Comment by Bernie Hackett [ 01/Mar/18 ] |
|
Is this |
| Comment by Bernie Hackett [ 01/Mar/18 ] |
|
Implicit sessions were a requirement for drivers. Again, I agree this situation is weird. Interestingly, this bug also seems to have been in earlier versions of 3.7 (see the linked PYTHON ticket), but doesn't manifest in the current master codebase. I can't tell from the git log what might have changed to resolve it. |
| Comment by David Storch [ 01/Mar/18 ] |
|
Sessions aren't even supported with legacy readMode, right? Do we have any guarantees about things working if the client combines legacy wire ops like OP_KILL_CURSORS with cursors created inside a logical session? |
| Comment by Shane Harvey [ 28/Feb/18 ] |
|
OP_KILL_CURSORS works fine with PyMongo 3.5.1 where the only difference is that there's no session sent with the find command. The bug seems to be a cursor created with a session cannot be killed on mongos via OP_KILL_CURSORS. |
| Comment by Bernie Hackett [ 28/Feb/18 ] |
|
The related Python test passes with MongoDB 3.6.0 and 3.7.latest. The test case is a bit weird. It tests an unfortunate situation caused by a deprecated PyMongo API we can't break before the next major version bump. shane.harvey, can you try to reproduce this with an old PyMongo version that doesn't support the find command? 3.1.x should do it. I'm concerned that OP_KILL_CURSORS is just broken, period, which would break any application that uses an old driver. |