[SERVER-59825] Include the Connection ID in OCSP Error Messages within mongoD logs Created: 03/Sep/21  Updated: 27/Oct/23  Resolved: 21/Jan/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Aaron Bromberg Assignee: Spencer Jackson
Resolution: Community Answered Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Sprint: Security 2022-01-24
Participants:

 Description   

Problem:
With the switch to Structured Logging and the inclusion of OCSP options within the driver and server, when OCSP errors occur, the associate log lines do not include a connection ID, therefore cannot be definitively tracked back to the client/source of the error.

Example:

{"t":{"$date":"2021-09-03T14:24:44.242+00:00"},"s":"W",  "c":"NETWORK",  "id":5512201, "ctx":"OCSP Fetch and Staple","msg":"Server was unable to staple OCSP Response","attr":{"reason":{"code":141,"codeName":"SSLHandshakeFailed","errmsg":"SSL peer certificate revocation status checking failed: Could not verify X509 certificate store for OCSP Stapling. error:00000000:lib(0):func(0):reason(0)"}}}

This makes filtering internal Atlas issues difficult to separate from client-side issues, and has lead to multiple support tickets where customer quote these log lines as reasons for application connection problems.

Proposed Solution:
Include the connection ID in the log line so that we can provide a "complete connection lifetime" from the server logs, and filter actual issues from Atlas noise.



 Comments   
Comment by Spencer Jackson [ 21/Jan/22 ]

As a note, OCSP can happen in several places:
1) Client side. When a driver that supports OCSP connects to a server, the server provides an X.509 certificate that supports OCSP, but the server did not staple, the client will go and fetch an OCSP response.
2) OCSP Stapling. A server can pre-emptively fetch OCSP responses, which it will provide to clients during their TLS handshake. This allows clients to avoid performing response acquisition.
3) Server side, while talking to other servers which don't support OCSP stapling. A server opening a connection to another server may need to perform the same logic that clients perform.
We'll never see the outcome of 1, unless something goes wrong and the client hangs up, but we'll never learn why. Scenario 2 is totally asynchronous and not tied to particular clients or connections. Scenario 3 happens in egress networking, and is purely server-to-server, so it's not necessarily associated with an individual end-user connection

Comment by Aaron Bromberg [ 30/Dec/21 ]

Hi spencer.jackson.  Since the OCSP validation is supposed to happen in the background, do you know if we are expecting to show INFO and WARN OCSP log lines in the mongod logs within Atlas when OCSP is enabled in the driver/client?

Comment by Spencer Jackson [ 30/Dec/21 ]

Hello aaron.bromberg, I do not believe this request is possible, because the "Fetch and Staple" operation is performed independently from any client connection. It is a background operation which pre-emptively requests OCSP responses and readies them to transmission to future clients. The error message that you're observing could be related to SERVER-55122.

Comment by Lauren Lewis (Inactive) [ 21/Dec/21 ]

Moving to backlog-server-security for triage.

Generated at Thu Feb 08 05:48:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.