[GODRIVER-2037] Don't clear the connection pool on Context timeout during handshake Created: 02/Jun/21  Updated: 28/Oct/23  Resolved: 30/Jun/21

Status: Closed
Project: Go Driver
Component/s: None
Affects Version/s: None
Fix Version/s: 1.5.4

Type: Bug Priority: Major - P3
Reporter: Matt Dale Assignee: Matt Dale
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to GODRIVER-2024 Connection pool, long semaphore wait ... Closed
related to GODRIVER-2068 Replace all uses of isPoolCleared() a... Closed
related to GODRIVER-2138 Remove unnecessary operation Context ... Closed
Epic Link: Connection pool improvements

 Description   

Currently we will clear the driver connection pool if we encounter any error during new connection handshake (see topology.Server#ProcessHandshakeError). If the error encountered is a driver-side timeout (e.g. a Context cancellation), that doesn't indicate a problem with connections in the pool like a network error does. Don't clear the connection pool in the case that the error encountered during new connection handshake is a driver-side timeout.

E.g. similar logic from topology.Server#ProcessError:

// ...
if wrappedConnErr == context.Canceled || wrappedConnErr == context.DeadlineExceeded {
	return driver.NoChange
}
// ...



 Comments   
Comment by Githook User [ 30/Jun/21 ]

Author:

{'name': 'Matt Dale', 'email': '9760375+matthewdale@users.noreply.github.com', 'username': 'matthewdale'}

Message: GODRIVER-2037 Don't clear the connection pool on client-side connect timeout errors. (#688)
Branch: release/1.5
https://github.com/mongodb/mongo-go-driver/commit/19a4ed190107d5ed23aff8c0eb8ae3cb8c9717c4

Comment by Githook User [ 30/Jun/21 ]

Author:

{'name': 'Matt Dale', 'email': '9760375+matthewdale@users.noreply.github.com', 'username': 'matthewdale'}

Message: GODRIVER-2037 Don't clear the connection pool on client-side connect timeout errors. (#688)
Branch: master
https://github.com/mongodb/mongo-go-driver/commit/5199a0b7a47957a1679dd6b861463ebf5bd40cde

Comment by Matt Dale [ 16/Jun/21 ]

Investigating this further, the change is possible by only modifying Server#ProcessHandshakeError, although the logic is somewhat convoluted:

func (s *Server) ProcessHandshakeError(...) {
	// ...
	isTimeout := func(err error) bool {
		// Handle any self-reporting timeout.
		if os.IsTimeout(err) {
			return true
		}
		// Extract any wrapped errors from a *net.OpError.
		if opErr, ok := err.(*net.OpError); ok {
			err = opErr.Err
		}
 
		// Extract any wrapped errors that implement Unwrap()
		for {
			wrapper, ok := err.(interface{ Unwrap() error })
			if !ok {
				break
			}
			err = wrapper.Unwrap()
		}
 
		return err == context.Canceled ||
			err == context.DeadlineExceeded ||
			// Handle the case where the error has been replaced by net.errCanceled, which isn't exported
			// and can't be compared directly.
			err.Error() == "operation was canceled"
	}
 
	if isTimeout(wrappedConnErr) {
		return
	}
	// ...
}

It's unclear how many cases the Unwrap loop handles, but could be important for completeness. I tested the code using an explicitly cancelled or timed-out Context when calling DialContext in topology.connection#connect and only two conditions were hit:

  1. os.IsTimeout caught Context deadline exceeded conditions
  2. err.Error() == "operation was canceled" caught Context canceled conditions

 We could probably clean that logic up by handling the most common conditions (i.e. *net.OpError with net-specific error messages), but it basically works.

Generated at Thu Feb 08 08:37:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.