[SERVER-59838] killCursors doesn't kill idleCursor on sharded clusters Created: 08/Sep/21 Updated: 27/Oct/23 Resolved: 16/Sep/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Edwin Zhou | Assignee: | Bernard Gorman |
| Resolution: | Works as Designed | Votes: | 0 |
| Labels: | query-director-triage | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Operating System: | ALL | ||||
| Sprint: | QE 2021-09-20 | ||||
| Participants: | |||||
| Description |
|
Attempting to use killCursors to kill an idleCursor will output that the cursorId isn't found despite being present in currentOp on a sharded replicaset. When this behavior occurs on an unsharded replica set, the idleCursor is correctly identified and deleted. Looking closely at the currentOp on an unsharded and sharded cluster, the difference is the sharded cluster idleCursor includes a cursor.originatingCommand.$audit field, whereas it's absent in the idleCursor operation on an unsharded cluster. Observed output:
currentOp on sharded cluster:
currentOp on a different, unsharded cluster. This operation can be killed by the killCursors command.
Repro:
|
| Comments |
| Comment by Edwin Zhou [ 16/Sep/21 ] | ||||||||||||||||||||||||||||||||||||||
|
Thanks bernard.gorman, it makes sense that in order to kill idle cursors that are created when querying on a mongos, we should be killing the parent cursor from the mongos instead of child cursors that end up on the mongod. I'll go ahead and close this ticket. | ||||||||||||||||||||||||||||||||||||||
| Comment by Bernard Gorman [ 15/Sep/21 ] | ||||||||||||||||||||||||||||||||||||||
|
Hi edwin.zhou, I believe the issue here is that you are attempting to kill a mongoD cursor on mongoS. In a sharded cluster, when the user starts executing a query, we create and store a cursor on mongoS. This mongoS cursor may have multiple "child" cursors which it tracks internally, one on each shard, but the user/client only ever interacts with the single, parent mongoS cursor. For instance, the driver issues getMore requests using the mongoS cursorId; if it attempts to issue a getMore on one of the child cursor IDs, then mongoS will throw a CursorNotFound exception, because mongoS does not have a cursor with that ID in its CursorManager. Similarly, if you run killCursors on mongoS and give it the ID of one of the shard cursors, it will do nothing, because mongoS does not own a cursor with the specified ID. However, if you run killCursors on the "parent" mongoS cursor ID, then it will kill both that cursor AND it will clean up all the child shard cursors across the cluster. In your script above, you are using $currentOp to retrieve the IDs of idle cursors. But by default, $currentOp retrieves all operations and cursors from the shards. In order to retrieve cursors and operations that are running on mongoS, you need to specify the {localOps:true} option. Below is a script which demonstrates this point by killing the mongoS cursor and confirming that both it, and its child cursor on the shard, are cleaned up.
Please let me know if this explains the behaviour you were seeing. If so, I'll close this ticket as "Works As Designed." |