[GODRIVER-2024] Connection pool, long semaphore wait causes connection close Created: 25/May/21 Updated: 02/Sep/21 Resolved: 26/Jul/21 |
|
| Status: | Closed |
| Project: | Go Driver |
| Component/s: | Connections |
| Affects Version/s: | 1.5.2 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Critical - P2 |
| Reporter: | Aaron Kahn | Assignee: | Matt Dale |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Epic Link: | Connection pool improvements | ||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||
| Documentation Changes: | Needed | ||||||||||||||||
| Documentation Changes Summary: | The proposed solution is to make two changes, both need additional documentation: |
||||||||||||||||
| Description |
| Comments |
| Comment by Matt Dale [ 26/Jul/21 ] | ||||||
|
Update:
| ||||||
| Comment by Matt Dale [ 04/Jun/21 ] | ||||||
|
akahn@tesla.com Thanks for getting back to me on those questions! | ||||||
| Comment by Aaron Kahn [ 03/Jun/21 ] | ||||||
|
Matt, | ||||||
| Comment by Aaron Kahn [ 03/Jun/21 ] | ||||||
|
@matt.dale Thanks for the follow up and creating additional issues. Good suggestion on MinPoolSize, we have used that in the past, but for the app that caused the reported incident above we did not have a minimum set. | ||||||
| Comment by Matt Dale [ 03/Jun/21 ] | ||||||
|
akahn@tesla.com I have a few questions to help me better understand your use case:
As far as mitigations, I recommend setting the client minPoolSize to a value greater than 0. Setting minPoolSize starts a background goroutine that runs once per minute and attempts to maintain at least the configured number of connections in the pool, creating new connections until minPoolSize is reached. The maintenance goroutine creates connections using the timeout configured with SetConnectTimeout or the default connection timeout (30 seconds). E.g. client configuration:
Additionally, I've created two new tickets that describe driver improvements to reduce the impact of driver-side operation timeouts on the connection pool:
I think the changes to accomplish GODRIVER-2038 may be similar to the refactor you proposed using channels and async connection creation. If you're still working on that improvement, please continue and include me as a reviewer on any PRs. | ||||||
| Comment by Matt Dale [ 28/May/21 ] | ||||||
|
akahn@tesla.com I've been able to reproduce a similar problem with many connections being closed due to operation timeouts and then new connections being created with the timeout of the operation context instead of a separate connection timeout. I'm still investigating the reason behind the connections being closed in the first place and looking for any short-term mitigations, but I agree that the design of the connection pool could be significantly improved. Please tag me on any PRs you open, I'd be happy to review them. Thanks! | ||||||
| Comment by Aaron Kahn [ 28/May/21 ] | ||||||
|
Matt, -Aaron | ||||||
| Comment by Matt Dale [ 28/May/21 ] | ||||||
|
akahn@tesla.com thanks for reporting this issue! We're looking into it and will let you know if we have any questions. |