[SERVER-43258] {shutdown:1} command (always?) fails Created: 10/Sep/19  Updated: 06/Dec/22  Resolved: 24/Feb/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor - P4
Reporter: Oleg Pudeyev (Inactive) Assignee: Backlog - Service Architecture
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Service Arch
Sprint: Service Arch 2019-09-23, Service Arch 2019-10-07
Participants:

 Description   

When a client sends

{shutdown:1}

admin command, the server appears to close the connection on which the command was sent prior to sending a response.

Shell test:

ruby-driver-rs:SECONDARY> db.runCommand({shutdown:1})
2019-09-10T18:03:08.584-0400 E QUERY    [thread1] Error: error doing query: failed: network error while attempting to run command 'shutdown' on host '127.0.0.1:30201'  :
DB.prototype.runCommand@src/mongo/shell/db.js:168:1
@(shell):1:1
2019-09-10T18:03:08.585-0400 I NETWORK  [thread1] trying reconnect to 127.0.0.1:30201 (127.0.0.1) failed
2019-09-10T18:03:08.585-0400 W NETWORK  [thread1] Failed to connect to 127.0.0.1:30201, in(checking socket for error after poll), reason: Connection refused
2019-09-10T18:03:08.585-0400 I NETWORK  [thread1] reconnect 127.0.0.1:30201 (127.0.0.1) failed failed 
2019-09-10T18:03:08.589-0400 I NETWORK  [thread1] trying reconnect to 127.0.0.1:30201 (127.0.0.1) failed
2019-09-10T18:03:08.589-0400 W NETWORK  [thread1] Failed to connect to 127.0.0.1:30201, in(checking socket for error after poll), reason: Connection refused
2019-09-10T18:03:08.589-0400 I NETWORK  [thread1] reconnect 127.0.0.1:30201 (127.0.0.1) failed failed 
> 

Ruby driver test:

irb(main):006:0> c.database.command(shutdown:1)
D, [2019-09-10T18:00:34.843124 #11418] DEBUG -- : MONGODB | localhost:14440 | admin.shutdown | FAILED | EOFError: end of file reached (for 127.0.0.1:14440 (no TLS)) | 0.004188575s

Given that the command always fails, it is difficult/impossible for the application trying to shut down a server to verify that the server initiated the shutdown command.

The command can fail due to being issued against the wrong db for example:

ruby-driver-rs:SECONDARY> db.runCommand({shutdown:1})
{
	"operationTime" : Timestamp(1568152973, 1),
	"ok" : 0,
	"errmsg" : "shutdown may only be run against the admin database.",
	"code" : 13,
	"codeName" : "Unauthorized",
	"$clusterTime" : {
		"clusterTime" : Timestamp(1568152973, 1),
		"signature" : {
			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
			"keyId" : NumberLong(0)
		}
	}
}
ruby-driver-rs:SECONDARY> 

This is especially problematic when scripting `mongo` from the Unix shell since simply relying on the exit status of the command does not provide any guarantees as to the success/failure for

{shutdown:1}

. The driver can inspect precise error message returned and act accordingly, but this requires a fair amount of effort and does not help if there was in fact a network problem during the execution of the command.

The server should send a successful response to the command and then close the connection (or perform other shutdown-related activities).



 Comments   
Comment by Lauren Lewis (Inactive) [ 24/Feb/22 ]

We haven’t heard back from you for at least one calendar year, so this issue is being closed. If this is still an issue for you, please provide additional information and we will reopen the ticket.

Comment by Oleg Pudeyev (Inactive) [ 13/Sep/19 ]

To clarify, I think it is perfectly ok to close (or start closing) the connection on the server after sending the response to the shutdown command. My impression is right now the connection on which shutdown command is received is closed without a response being sent first.

> Would that best effort response actually be useful to you?

Yes, this is what I had in mind I think.

Comment by Mira Carey [ 13/Sep/19 ]

Given that the shutdown command was just received on a connection, in most cases the connection should be operational to send the response on and close it I would think?

The problem is that there's no way with tcp (up at the stream level) to know that the other side has received any bytes you've sent. You need some kind of signal that indicates that the client has received your response. Either closing the socket, or a reply sent to you.

In the absence of that, you have to rely on timeouts.


I think this is a thing we could tack on, because there's already a certain amount of extra time after we've decided to enter terminal shutdown before the process actually goes down, but I'd still constrain that amount of time we waited. I.e. I'd send the reply, half close the socket and then leave it open until actual process death. I think that would make the user experience a bit cleaner most of the time, but I still wouldn't be able to guarantee you always get that response (without changing the contract with clients).

Would that best effort response actually be useful to you?

Comment by Oleg Pudeyev (Inactive) [ 13/Sep/19 ]

mysql seems to be able to do it:

speed# mysql
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 51
Server version: 10.3.17-MariaDB-1 Debian buildd-unstable
 
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
 
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
 
MariaDB [(none)]> shutdown
    -> ;
Query OK, 0 rows affected (0.000 sec)
 
MariaDB [(none)]> 
MariaDB [(none)]> select 1;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)
ERROR: Can't connect to the server
 
unknown [(none)]> 

Given that the shutdown command was just received on a connection, in most cases the connection should be operational to send the response on and close it I would think?

Comment by Mira Carey [ 13/Sep/19 ]

The server should send a successful response to the command and then close the connection (or perform other shutdown-related activities).

Part of the problem is that we don't have any way to do that reliably. We can ::send() a response, but all that does is put it on the outbound write queue, it doesn't say anything about when the other side will receive it. A ::send() followed shortly thereafter by a ::shutdown() doesn't turn into a received message. The only reliable thing would be to make the flow:

Client Server
sendShutdownCommand  
  send response
  half close socket
receive response  
fully close socket  
  see eof on socket
  shutdown server

Doing that would effectively prevent the shutdown from proceeding until the client went away

For something unreliable, we could add in a sleep, but do you think it's worthwhile if it isn't reliable?

It's also worth noting that any waiting would effectively make shutdown take longer for all existing users in order to enable a use case which doesn't currently work.

Comment by Oleg Pudeyev (Inactive) [ 12/Sep/19 ]

It is my understanding that https://jira.mongodb.org/browse/SERVER-5467 essentially proposes to work around the server always failing the command in the shell. While this will fix the shell, it will not do anything for drivers and thus for any applications using the drivers to interact with the server.

However, if this ticket is addressed then https://jira.mongodb.org/browse/SERVER-5467 would become redundant, thus I think https://jira.mongodb.org/browse/SERVER-5467 can be closed as a duplicate of this ticket.

Comment by Danny Hatcher (Inactive) [ 11/Sep/19 ]

I believe this falls under SERVER-5467 and should be closed as a dupe. oleg.pudeyev do you agree?

Generated at Thu Feb 08 05:02:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.