[SERVER-21710] Allow pinned ClientCursors to be killed on mongod Created: 01/Dec/15  Updated: 29/Jan/18  Resolved: 10/Jan/18

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: 3.7.1

Type: Improvement Priority: Major - P3
Reporter: David Storch Assignee: Ian Boros
Resolution: Done Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-32307 Make AsyncResultsMerger kill sequence... Closed
Related
related to SERVER-28090 Add ability to interrupt operations u... Closed
is related to JAVA-2651 Unable to exit from hasNext() for tai... Closed
is related to DRIVERS-421 Cursor iteration should complete (abn... Closed
Backwards Compatibility: Fully Compatible
Sprint: Query 2018-01-01, Query 2018-01-15
Participants:

 Description   

On mongod, a killCursors command or OP_KILL_CURSORS message acting on a pinned cursor will fail to kill the cursor. By contrast, the mongos ClusterCursorManager can kill an alive cursor regardless of whether or not it is pinned. We should change the mongod behavior to be consistent with that of mongos. The killCursors operation should make a best effort to kill all cursors, irrespective of whether the cursor is currently in use. Failing to kill in this edge case opens the possibility of leaked ClientCursors.



 Comments   
Comment by Githook User [ 10/Jan/18 ]

Author:

{'email': 'ian.boros@10gen.com', 'name': 'Ian Boros'}

Message: SERVER-21710 Add ability to kill pinned cursors on mongod
Branch: master
https://github.com/mongodb/mongo/commit/da80e97d103434a6bc566c589a23af13477e1a28

Comment by David Storch [ 08/Nov/17 ]

Yeah, that's known behavior, thanks for the correction. Changing that should fall into the scope of this ticket as well.

Comment by Jeffrey Yemin [ 07/Nov/17 ]

FWIW killSessions doesn't work when there is a pinned change stream cursor associated with the session:

// request
{ "killSessions" : [{ "id" : { "$binary" : "faSCkv0KTxePjULgM1JiuQ==", "$type" : "04" } }], "$db" : "admin", ... }
// response
{ "ok" : 0.0, "errmsg" : "Cannot kill pinned cursor: 7646490736246576201", "code" : 96, "codeName" : "OperationFailed"... }

Comment by David Storch [ 07/Nov/17 ]

I see. Clients should not attempt to issue killCursors from a separate thread. Since killing pinned cursors has never been supported, killCursors is a way for the thread that is iterating the cursor to safely abandon it without exhausting the result set. It has never been a way to kill an active cursor from another thread. That's what killOp/killSessions are for. $changeStream cursors, and tailable cursors in general, are certainly more susceptible to this problem since cursors are pinned for longer. But it's never a good idea to use killCursors while a getMore may be in progress on versions 3.6 and older.

Clearly it would be desirable to allow applications to behave as you describe against future versions of the server. As I alluded to above, we're hoping to schedule this ticket as part of an effort to make killing queries easier, especially queries against sharded clusters.

Comment by Jeffrey Yemin [ 07/Nov/17 ]

david.storch a change stream cursor will typically remain pinned for the majority of its lifetime when the collection that is being watched is mostly idle. Furthermore, because drivers generally iterate cursors in a blocking fashion, for tailable cursors there is typically a loop within the call to cursor.next() which repeatedly calls getMore until at least one document is returned or until an error is reported (e.g. cursorNotFound). So if an application contains a thread that is blocking on a change stream cursor, and another thread that attempts to kill that cursor in order to unblock the first thread, the killCursors command can fail with the following error:

{ "ok" : 0.0, "errmsg" : "cursor id 5290539973175233834 is already in use", "code" : 12051, "codeName" : "Location12051", ... }

This is different from what's reported for a normal tailable cursor in a similar situation:

{ "cursorsKilled" : [], "cursorsNotFound" : [], "cursorsAlive" : [{ "$numberLong" : "237307113234" }], "cursorsUnknown" : [], "ok" : 1.0, ... }

but the effect is the same: the cursor remains alive, and an application with a thread blocking on the cursor iteration may never exit.

Drivers could work around this by inserting within that inner loop that's calling getMore a check of whether the application has at least attempted to close the cursor, and that would catch most of the problems, but currently that's not specified behavior for all drivers. But that has the bad effect of leaving the change stream cursor open on the server until it times out.

Comment by David Storch [ 07/Nov/17 ]

jeff.yemin, I'm not sure I follow the relationship between this ticket and change streams, can you elaborate?

Note that pinned cursors may be killed with killOp. Also note that we hope to improve cursor-killing behavior for 3.8. Our hope is that in 3.8 users will be able to preemptively kill all cursors belonging to a sharded operation using killSessions.

Comment by Jeffrey Yemin [ 06/Nov/17 ]

This appears to be the root cause of JAVA-2651, where a user reports a tailable cursor that won't die. david.storch I'm wondering if this affects change streams, as the pattern of closing a change stream cursor from another thread, in order to stop the change stream cursor iteration, is likely to be quite common.

Comment by Mathias Stearn [ 07/Feb/17 ]

This will be needed to support abandoning exhaust cursors without closing the connection.

Generated at Thu Feb 08 03:58:11 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.