[SERVER-32103] CTRL-C in mongo shell does not terminate long running ops if connected to mongos Created: 28/Nov/17  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Shell
Affects Version/s: 3.4.10
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: James Kovacs Assignee: Backlog - Query Execution
Resolution: Unresolved Votes: 0
Labels: neweng, query-44-grooming
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-34077 jstests/core/shellkillop.js doesn't d... Closed
Related
is related to SERVER-1445 jstests/shellkillop.js fails in a sha... Closed
is related to SERVER-23168 Shell killOperationsOnAllConnections ... Closed
is related to SERVER-18094 currentOp on a mongoS should also sho... Closed
is related to SERVER-28649 Create a currentRouterOps command in ... Closed
Assigned Teams:
Query Execution
Operating System: ALL
Steps To Reproduce:
  1. Create a sharded cluster (e.g. mlaunch init --sharded 1 --single --port 47017)
  2. Connect to the mongos using the mongo shell: mongo --port 47017
  3. Start a long-running operation:

    use test
    db.foo.remove({})
    db.foo.insert({})
    db.foo.find({$where: "for(var i=0;i<100000;i++) sleep(1000)"})
    

  4. Launch a second mongo shell mongo --port 47017 and execute db.currentOp(). Notice the long-running operation started in the previous step.
  5. In the first shell, press CTRL-C. Notice that you are not prompted to terminate the operation.
  6. In the second shell, run db.currentOp(). Notice that the long-running operation is still executing.

If you repeat the above steps connected to the mongod directly instead, you are prompted to terminate the long-running command as expected.

Sprint: Query 2018-03-26
Participants:
Case:

 Description   

killOperationsOnAllConnections is called in response to CTRL-C in the mongo shell. It finds long-running operations started by this shell and provides the user the option to terminate them. This is accomplished with the currentOp and killOp commands.

For each connection, the shell calls currentOp and examines client or client_s to determine if that command was started by the current shell. When a command is run against a mongod, client contains the IP:PORT of the mongo shell. These commands prompt for termination. If however the command is routed through a mongos, client_s contains the IP:PORT of the mongos, not the mongo shell. These commands do not prompt for termination and are left running on the mongod.



 Comments   
Comment by David Storch [ 28/Jun/19 ]

Flagging for re-triage. We may be able to pursue a simpler fix to this now. ian.boros's prototype for a fix involved creating a new DBCommandCursor concept, but this may now be supported by the pre-existing DBClientCursor class.

Comment by Charlie Swanson [ 08/Feb/18 ]

Ok - so I think I've figured out what's going on here.

It looks like the shell is trying to identify which operations in the currentOp output were started by itself (this particular instance of the shell). It wants to do this to avoid killing all operations on the server, instead killing only its own.

To do this, it's looking at either the 'client' or 'client_s' field in the currentOp output. The 'client' field usually identifies who started the operation (it's something like "127.0.0.1:53518", from what I can tell this is the host and port that the request came from). When the shell is talking to a mongod, this will be the shell's own URI because the request was sent directly from the shell to the mongod. Thus when connected to a mongod, the shell will be able to correctly kill its own operations.

When currentOp is run against mongos, the 'client' field is renamed to be 'client_s', to make it clear that this is describing which mongos the operation originated from. The shell uses this field to try to figure out if the request came from itself, but it will never match. The shell never connects directly to the mongod which generates this information, so the "client_s" field will always be the host and port corresponding to the mongos' request.

So, the shell never finds any operations in progress that it started, and never prompts the user to interrupt them. It looks like interrupting operations from a shell connected to a mongos has just never worked (I tested on 3.2, 3.4 and 3.6 - it just crashed on 3.2 (SERVER-23168), and never prompted on 3.4 and 3.6).

I see two options going forward:

  1. We can fix this by instead relying on the client metadata to identify our own operations. The shell generates the client metadata object, including a 'clientMetadata.mongos.client' field which has the host and port we're looking for. It seems safe to assume that the server will never modify the client metadata, and will always report it back the way we sent it. We could use this information to identify our own operations. This strategy would likely be eligible for backport to 3.6 at least (3.4 looks like it doesn't include as much client metadata information in currentOp, so it might not work there).
  2. We can wait until we implement SERVER-18094, in which case the (mongos-local) currentOp entry will likely (hopefully) include the shell's desired information.
Comment by Ian Whalen (Inactive) [ 12/Jan/18 ]

Assigning to Charlie for future sprint to at least investigate complexity so we can/should decide whether to work into this project.

Comment by Gregory McKeon (Inactive) [ 05/Jan/18 ]

Sending to Query team to see if this can be done as part of the sharded kill epic.

Generated at Thu Feb 08 04:29:12 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.