[DRIVERS-2480] Mitigate negative effects of OCSP endpoint timeouts Created: 25/Oct/22 Updated: 08/Nov/22 |
|
| Status: | Backlog |
| Project: | Drivers |
| Component/s: | OCSP |
| Fix Version/s: | None |
| Type: | Spec Change | Priority: | Major - P3 |
| Reporter: | Jeremy Mikola | Assignee: | Unassigned |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Driver Changes: | Needed | ||||||||||||||||
| Description |
SummaryWhen OCSP stapling is unavailable, drivers may attempt to contact one or more OCSP endpoints. Per Suggested OCSP Behavior, the default timeout is five seconds. Drivers use connectTimeoutMS as the timeout for connection handshake (Server Monitoring spec) and the handshake includes TLS (Handshake spec). Therefore, an inaccessible OCSP endpoint could add five seconds to the handshake. If the application is using a smaller connectTimeoutMS value, an inaccessible OCSP endpoint could prevent the driver from establishing a connection to the server. This is irrespective of whether a driver has "soft fail" behavior (i.e. TLS continues if OCSP cannot complete). Drivers with "soft fail" behavior would allow the connection to continue after hitting an OCSP timeout, but only if connectTimeoutMS has not been exhausted. When this was observed in a customer report involving the PHP driver, there was originally no indication that TLS/OCSP was involved, as the problem manifested itself as a server selection failure due to a socket timeout attempting to establish a connection. We ultimately confirmed the issue thanks to libmongoc trace logs There are several courses of action we might consider to address this:
Note: the Client Side Operations Timeout spec may influence OCSP timeouts; however, even if OCSP timeouts are configurable (and will dynamically scales down based on the remaining timeoutMS), I think we'd still face an issue with exposing the source of the timeout. In that case, action items for documentation and logging may still be worth addressing. MotivationWho is the affected end user?Applications using TLS with OCSP but without OCSP stapling. How does this affect the end user?OCSP timeouts could prevent the driver from making server connections by exhausting the connection timeout. How likely is it that this problem or use case will occur?This is rare, but could happen due to many factors: app server firewall preventing outgoing HTTP requests, OCSP server experiencing downtime, high latency contacting the OCSP server. If the problem does occur, what are the consequences and how severe are they?Ranges from merely delaying a connection to preventing it entirely. Is this issue urgent?No. Is this ticket required by a downstream team?No. Is this ticket only for tests?No. |
| Comments |
| Comment by Kaitlin Mahar [ 03/Nov/22 ] |
|
jmikola@mongodb.com, I filed and linked DRIVERS-2494 to cover OCSP logging specifically. |
| Comment by Tom Selander [ 25/Oct/22 ] |
|
Leads Triage: Backlogging this for now, we may decide to go with the third suggested option. |
| Comment by Jeremy Mikola [ 25/Oct/22 ] |
|
kaitlin.mahar@mongodb.com: I'm not sure if OCSP logging would fall under SDAM (DRIVERS-1670) (as I expect the handshake spec might), but if not you may want to create a separate ticket for the OCSP and link it up here. |