[DRIVERS-2347] Prevent conflating operation timeout with connection establishment timeout Created: 03/Jun/22 Updated: 06/Nov/23 |
|
| Status: | In Progress |
| Project: | Drivers |
| Component/s: | CMAP, CSOT |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Unknown |
| Reporter: | Matt Dale | Assignee: | Shane Harvey |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Epic Link: | DRIVERS-555 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Driver Changes: | Needed | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Quarter: | FY24Q3, FY25Q1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Engineering Lead: | |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Program Manager: | |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Start date: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Driver Compliance: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
SummaryThe process for checking out and establishing connections described in the CMAP spec combined with the timeout behavior described in the Server Selection cause issues when users specify low operation timeouts. Specifically, the CMAP spec describes that if there are no available connections, a connection pool should establish a new connection in-line with the check-out. The CSOT spec describes that the timeout used to create a TCP/TLS connection is min(connectTimeoutMS, min(serverSelectionTimeoutMS, remaining timeoutMS)) and the timeout used to handshake with the MongoDB server should be min(operationTimeout, remaining computedServerSelectionTimeout). As a result, if an operation times out, any in-progress connection establishment necessarily times out as well. If most operations have low timeouts (e.g. 1-5 seconds), the driver may not have enough time to establish any new connections, leading to a state where the driver cannot create any connections. Consider the case discovered in the Go driver, which supports client-side operation timeouts via the Go context.Context type:
Note that the above was relevant for Go Driver v1.7.x and earlier. See Any driver that implements a connection pool as described in the current CMAP spec and implments client-side operation timeout as described in the current CSOT spec will potentially encounter the same issue discovered in the Go driver. To prevent that issue, drivers must never let operation timeout influence connection establishment timeout. Honoring both operation timeout and connection establishment timeout requires running connection check-out and connection establishment in different threads. Drivers should continue connection establishment for connectTimeoutMS, even if the check-out that requested the new connection times out. Update the CMAP spec to describe the necessary separation of threads of execution between connection check-out and connection establishment. Update the CSOT spec to describe that connection establishment should always use a timeout of connectionTimeoutMS, independent of operation timeout. Note that implementing CSOT likely requires drivers to refactor their connection pool implementations. Note that an alternative to separate threads is to always continue establishing connections for connectTimeoutMS, even if the operation timeout has expired. Consider the list of drivers that establish connections in the checkOut function MotivationWho is the affected end user?Users using drivers that support client-side operation timeouts, especially users who set low timeouts (1-5 seconds) and run services with high operation volumes (1,000+ op/sec). How does this affect the end user?The driver may enter a state where it cannot create any new connections for a long period of time. The user's services may experience extended outages if that happens. How likely is it that this problem or use case will occur?Fairly likely for users who set low timeouts (1-5 seconds) and run services with high operation volumes (1,000+ op/sec). If the problem does occur, what are the consequences and how severe are they?The user's services may experience extended periods where the driver cannot establish connections and cannot do any work, either at startup or intermittently during the operation of the service. Is this issue urgent?Must be completed before DRIVERS-555 can be implemented in most drivers. Is this ticket required by a downstream team?No. Is this ticket only for tests?No. |
| Comments |
| Comment by Shane Harvey [ 03/Jun/22 ] |
|
Is this a duplicate of DRIVERS-1801? |