[GODRIVER-2145] Connection pool improvements Created: 02/Sep/21  Updated: 28/Oct/23  Resolved: 05/Nov/21

Status: Closed
Project: Go Driver
Component/s: None
Affects Version/s: None
Fix Version/s: 1.8.0

Type: Epic Priority: Major - P3
Reporter: Kevin Albertson Assignee: Matt Dale
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Start date:
End date:
Calendar Time: 16 weeks, 4 days
Scope Cost Estimate: 5
Cost to Date: 12
Final Cost Estimate: 12
Cost Threshold %: 150
Detailed Project Statuses:

Engineer: Matt

Summary: Address unnecessary connection pool clears and connection churn on operations with a small context timeout

2021-11-02: Updated target date to 2021-11-05

Status update:

  • Functional changes merged.
  • Tesla has been informed of the changes, and encouraged to run a load test.
  • Remaining work is technical debt.

Rationale for delays:

  • No rationale.

Risks:

  • None.

2021-10-19: Updated target date to 2021-10-29

Status update:

  • Connection pool rewrite is waiting on final review from Kevin and resolving a problem with a stress test using up all ports on some hosts.
  • Put up a review for reducing connection churn by erroring early if the operation timeout is less than the minimum tracked RTT.
  • Next: planning to ask Tesla to test a release candidate

Rationale for delays:

  • Surprising test failure with stress tests using up all ports.
  • Longer review time from reviewers with competing priorities.

Risks:

  • None.

2021-10-05: Updated target date to 2021-10-22.

Status update:

  • Connection pool creating connections in the background still in review. Meeting to discuss it in person next week.
  • Stress tests now in review.
  • Investigated and scoped a solution to avoiding connection churn when little time remains on an operation context.

Rationale for delays:

  • Four days of PTO.
  • Review of connection pool redesign taking longer than anticipated.

Risks:

  • None.

2021-09-21: No update to target date.

Status update:

  • Connection pool creating connections in the background in review.

Rationale for delays:

  • No delays.

Risks:

  • None. Still on track 2021-10-15 target date.

2021-09-07: Setting target date to 2021-10-15

Rationale: Estimated remaining work totals 6 weeks:


 Description   

Summary

Address unnecessary connection pool clears and connection churn on operations with a small context timeout.

Motivation

Who is the affected end user?

Customers and users setting small context timeouts on operations.

How does this affect the end user?

The Go driver can enter a state where all connections in a connection pool are perished (usually caused by a connection pool clear due to encountering a timeout during new connection establishment) and almost no new connections can be established in-line with an operation due to the low operation timeout. Customers using the Go driver may encounter application outages as a result of the Go driver’s inability to recover from this state.

How likely is it that this problem or use case will occur?

There are known customers experiencing this issue. It is likely if users are setting a small context timeout in operations.

If the problem does occur, what are the consequences and how severe are they?

A problem can result in an outage.

Is this issue urgent?

Yes, there are known customers that cannot migrate workloads to the Go driver due to this issue.

Is this ticket required by a downstream team?

No

Is this ticket only for tests?

No

Cast of Characters

Engineering Lead: Matt Dale
Document Author: Matt Dale
POCers:
Product Owner:
Program Manager:
Stakeholders:

Channels & Docs

Technical Design Document


Generated at Thu Feb 08 08:37:55 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.