[SERVER-6496] provide a way to kill a sharded query on all shards Created: 17/Jul/12  Updated: 28/Feb/23  Resolved: 03/Apr/18

Status: Closed
Project: Core Server
Component/s: Querying, Sharding
Affects Version/s: None
Fix Version/s: 4.0.0-rc0

Type: New Feature Priority: Major - P3
Reporter: Alon Horev Assignee: Backlog - Query Team (Inactive)
Resolution: Done Votes: 17
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-28090 Add ability to interrupt operations u... Closed
Documented
Related
related to SERVER-25497 Fix sharded query path to handle shut... Closed
related to SERVER-4984 make it possible to interrupt operati... Closed
related to SERVER-17696 Terminate sharded queries immediately... Closed
is related to SERVER-18094 currentOp on a mongoS should also sho... Closed
is related to SERVER-32307 Make AsyncResultsMerger kill sequence... Closed
is related to SERVER-33462 Allow killop on a mongos op id Closed
Assigned Teams:
Query
Backwards Compatibility: Fully Compatible
Participants:
Case:

 Description   
Issue Status as of April 4, 2018

FEATURE DESCRIPTION
This feature provides the following functionality:

  • Allows operators to list and kill queries running in a sharded cluster directly on a mongos node.
  • When a sharded query is killed on one shard, this causes the query to be terminated promptly in all other shards in the cluster.

VERSIONS
This feature is available in the 3.7.4 and newer development versions, and in the 4.0 and newer MongoDB production releases.

RATIONALE
Before this feature, killing a query which is active across multiple targeted shards required operators to run killOp manually against each of the involved shards. Killing a query on one shard did not always terminate the operation in a timely manner across the entire cluster.

In addition, killing a sharded query on a mongos node was not possible.

OPERATION
Sharded queries running on a mongos can be listed using the localOps flag to the $currentOp aggregation metadata source:

mongos> use admin;
mongos> db.aggregate([{$currentOp: {localOps: true}}]);

The reported operation IDs identify operations running on the mongos node. A mongos-local operation ID can be used as an argument to the killOp command in order to terminate the queries that mongos issued to the targeted shards on behalf of the client. For instance, to kill a particular sharded operation matching <filter> use:

mongos> use admin;
mongos> let opToKill = db.aggregate([{$currentOp: {localOps: true}}, {$match: <filter>}]).toArray()[0];
mongos> let opid = opToKill.opid;
mongos> db.killOp(opid)

Original Description

When trying to kill a query through mongos, It requires killing the query manually on every single shard.
Running queries without specifying a shard key or worse, without an index can be unacceptable in some scenarios and is an innocent mistake (consider killing a sharded query on a cluster of 100 shards).



 Comments   
Comment by David Storch [ 02/Apr/18 ]

Hi all,

This work is completed and will first be available in the 3.7.4 development release, which will evolve into the 4.0 stable release series.

Related ticket SERVER-18094 made it possible to list all sharded operations which are being executed on a particular mongos, using the localOps flag to the $currentOp aggregation metadata source. For example, you can run the following command against mongos in the shell:

mongos> use admin;
mongos> db.aggregate([{$currentOp: {localOps: true}}]);

Furthermore, mongos operation IDs are now killable due to the changes in SERVER-33462. That is, The opid field for each operation is valid for passing to mongos in a killOp command. Doing so will cause the entire sharded query, including sub-ops on each involved shard, to be terminated in a timely fashion. For instance, to kill a particular sharded operation matching <filter>:

mongos> use admin;
mongos> let opToKill = db.aggregate([{$currentOp: {localOps: true}}, {$match: <filter>}]).toArray()[0];
mongos> let opid = opToKill.opid;
mongos> db.killOp(opid)

Finally, 3.7.4 contains related enhancements which will cause sharded queries to quickly clean themselves up on all shards in the case of an error (e.g. SERVER-32307). Due to the enhancements listed above, this ticket can be closed as "Gone Away".

Comment by David Storch [ 14/Jul/16 ]

schwerin, I believe you are correct. As of MongoDB 3.2, if a mongos or mongod receives a find or aggregate command with a batchSize of 0, it will establish the cursor without doing the work to generate a batch of results. I'm not sure if the drivers, however, expose a user-friendly mechanism for passing a batchSize of 0 with the initial find/aggregation and then a separate batchSize on each subsequent getMore.

Comment by Andy Schwerin [ 14/Jul/16 ]

Beginning in MongoDB 3.2, if you can still connect to the mongos where the query started, you can kill the cursor for aggregations and finds, and it will contact the shards and kill the corresponding cursors. This approach only applies to operations that return cursors, but it's a start.

If an operation is long-running before it returns any results at all, I believe that the user can request that the cursor id be returned before any results are produced, but I'm not 100% certain. david.storch or rassi might know.

Generated at Thu Feb 08 03:11:51 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.