[SERVER-40583] Provide reason for why connections are closed Created: 11/Apr/19  Updated: 08/Jan/24  Resolved: 08/Jul/19

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.0.9
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Oleg Pudeyev (Inactive) Assignee: Benjamin Caimano (Inactive)
Resolution: Duplicate Votes: 0
Labels: neweng
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-34621 Log if client attempts renegotiation Closed
Related
related to SERVER-40556 Error: error doing query: failed: net... Closed
related to RUBY-1713 Topology flapping under load with agg... Closed
Sprint: Service Arch 2019-05-06, Service Arch 2019-05-20, Service Arch 2019-07-15
Participants:

 Description   

Currently, when a server closes connections, it sometimes provides a reason why it does that, and sometimes does not.

SERVER-39941 provides an example when the server does indicate the reason for closing a connection:

2019-03-04T13:07:11.146-0500 E NETWORK  [conn129] no SSL certificate provided by peer; connection rejected
 
2019-03-04T13:07:11.146-0500 I NETWORK  [conn129] Error receiving request from client: SSLHandshakeFailed: no SSL certificate provided by peer; connection rejected. Ending connection from 127.0.0.1:48388 (connection id: 129)

SERVER-40556 provides an example when the server does not indicate the reason:

2019-04-09T20:55:09.182-0400 I NETWORK  [listener] connection accepted from 113.55.127.140:3456 #3015 (1 connection now open)
 
2019-04-09T20:55:09.182-0400 I NETWORK  [conn3015] received client metadata from 113.55.127.140:3456 conn3015: { application: { name: "MongoDB Shell" }, driver: { name: "MongoDB Internal Client", version: "4.0.8" }, os: { type: "Windows", name: "Microsoft Windows 10", architecture: "x86_64", version: "10.0 (build 17763)" } }
 
2019-04-09T20:55:09.184-0400 I NETWORK  [conn3015] end connection 113.55.127.140:3456 (0 connections now open)

In cases like RUBY-1713, connections are being closed when the driver does not expect this leading to the driver marking the server unknown (per SDAM specifications), which ultimately can cause application unavailability. Troubleshooting these cases is currently difficult because the server side does not indicate whether a connection close was initiated by the server, and if so for what reason, vs the server closing a connection on which a network error occurred, for example.



 Comments   
Comment by Benjamin Caimano (Inactive) [ 08/Jul/19 ]

Alright, marking this as a duplicate then. Thanks oleg.pudeyev for getting back to me on this long in the tooth ticket.

Comment by Oleg Pudeyev (Inactive) [ 08/Jul/19 ]

This appears to satisfy the request in this ticket:

> Since 4.0.10/4.1.3, we always log at D2 why connections close except when repl or FCV trigger terminate.

Comment by Benjamin Caimano (Inactive) [ 02/Jul/19 ]

Since 4.0.10/4.1.3, we always log at D2 why connections close except when repl or FCV trigger terminate. Those should be pretty noisy spots in the logs. (For D2 logging, see here and here.)

oleg.pudeyev, your reading of the spec is correct, we close the socket on you, which is a network error on client side, and thus the host is UNKNOWN. In the case of a network error on the server side, obviously we can't do much. We theoretically hang up on TransportLayer::TicketSessionClosedStatus, but that isn't a real thing. That leaves the case of "Interruptions" which pretty much means shutdown as far as I can tell. We could theoretically wait around for one more command and then respond {okay: 0}, but that is contrary to the "exit hard" philosophy of mongo-server. I'm not certain what more we can offer in terms of diagnostics on the server side. If you have concrete suggestions, I'm all ears.

CC mira.carey@mongodb.com

Comment by Benjamin Caimano (Inactive) [ 02/Jul/19 ]

In the case of SERVER-40556, we increased the diagnostic information available in server logs via SERVER-34621 which landed in 4.0.10, sadly a later version than the customer is using.

Generated at Thu Feb 08 04:55:25 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.