Uploaded image for project: 'Drivers'
  1. Drivers
  2. DRIVERS-2480

Mitigate negative effects of OCSP endpoint timeouts

    • Type: Icon: Spec Change Spec Change
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Component/s: OCSP
    • Labels:
      None
    • Needed

      Summary

      When OCSP stapling is unavailable, drivers may attempt to contact one or more OCSP endpoints. Per Suggested OCSP Behavior, the default timeout is five seconds.

      Drivers use connectTimeoutMS as the timeout for connection handshake (Server Monitoring spec) and the handshake includes TLS (Handshake spec). Therefore, an inaccessible OCSP endpoint could add five seconds to the handshake.

      If the application is using a smaller connectTimeoutMS value, an inaccessible OCSP endpoint could prevent the driver from establishing a connection to the server. This is irrespective of whether a driver has "soft fail" behavior (i.e. TLS continues if OCSP cannot complete). Drivers with "soft fail" behavior would allow the connection to continue after hitting an OCSP timeout, but only if connectTimeoutMS has not been exhausted.

      When this was observed in a customer report involving the PHP driver, there was originally no indication that TLS/OCSP was involved, as the problem manifested itself as a server selection failure due to a socket timeout attempting to establish a connection. We ultimately confirmed the issue thanks to libmongoc trace logs

      There are several courses of action we might consider to address this:

      • Allow OCSP timeouts to be configurable (if supported by a driver's TLS library)
      • Provide documentation to educate users on the interaction between OCSP and connection timeouts. If OCSP timeouts cannot be configured, users should be aware that the five second default might exhaust connectTimeoutMS
      • Note that that tlsDisableOCSPEndpointCheck and tlsDisableCertificateRevocationCheck may be used to work around this issue. In the related PHP issue, the customer used tlsAllowInvalidCertificates, which is unadvisable because it disables much more than OCSP.
      • Add logging for OCSP. There is presently no ticket to add log messages to the OCSP spec (see: Logging component and linked issues in DRIVERS-1204).

      Note: the Client Side Operations Timeout spec may influence OCSP timeouts; however, even if OCSP timeouts are configurable (and will dynamically scales down based on the remaining timeoutMS), I think we'd still face an issue with exposing the source of the timeout. In that case, action items for documentation and logging may still be worth addressing.

      Motivation

      Who is the affected end user?

      Applications using TLS with OCSP but without OCSP stapling.

      How does this affect the end user?

      OCSP timeouts could prevent the driver from making server connections by exhausting the connection timeout.

      How likely is it that this problem or use case will occur?

      This is rare, but could happen due to many factors: app server firewall preventing outgoing HTTP requests, OCSP server experiencing downtime, high latency contacting the OCSP server.

      If the problem does occur, what are the consequences and how severe are they?

      Ranges from merely delaying a connection to preventing it entirely.

      Is this issue urgent?

      No.

      Is this ticket required by a downstream team?

      No.

      Is this ticket only for tests?

      No.

            Assignee:
            Unassigned Unassigned
            Reporter:
            jmikola@mongodb.com Jeremy Mikola
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: