[GODRIVER-1591] Pool event should return client id. Created: 25/Apr/20  Updated: 30/Mar/22

Status: Backlog
Project: Go Driver
Component/s: Monitoring
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Major - P3
Reporter: jeet parmar Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Comments   
Comment by jeet parmar [ 05/May/20 ]

Thats very helpful.

Thanks for the update will go thru the POC.

Comment by Divjot Arora (Inactive) [ 05/May/20 ]

nsaoly@gmail.com Apologies for the delay. We were concentrating on releasing v1.3.3 today and I haven't been able to update this ticket. We discussed it during our triage meeting and we think this is a useful feature, so we'll put the ticket into "Open" to be done after the work we've currently scheduled.

As I mentioned in a previous comment, I think you may be able to do your use case without this feature. If not, you can also consider wrapping the mongo.Client type in a new type that also has an ID and track that ID in the client's pool monitor. This way, you know exactly which Client caused a PoolEvent. I did a small POC for a stateful PoolMonitor last week that shows the basic setup I'm envisioning: https://github.com/divjotarora/pool-monitor-poc.

Comment by jeet parmar [ 05/May/20 ]

@Divjot

I understand you are bogged with tons of issues m however I just wanted to know the process and timeline for this so that I can plan my work for this accordingly..

Thanks

Jeet

Comment by jeet parmar [ 30/Apr/20 ]

Sure that will be great thanks again @Divjot.

Comment by Divjot Arora (Inactive) [ 30/Apr/20 ]

30 seconds should be plenty for most use cases, but this is completely dependent on your network. If these connection errors are coming from a pool that is configured with a PoolMonitor, my guess is that you'll see events with type GetFailed in your monitor.

As far as having the client ID in the event, I think this could be a useful addition. Now that I have a better understanding of your use case, I'll discuss with the team and comment here once we've made a decision.

– Divjot

Comment by jeet parmar [ 30/Apr/20 ]

Thanks this helps a lot.
I was able to validate whatever you said. and was checking if i wrote some buggy code.

Mean while just to see what might be wrong my aws lambda logs are filled with below I am connecting to  mongo atlas. but this happens only on re-connection after idle timeout on fresh connections. I thought the pool event was triggered because of this errors

 

connection() : handshake failure: connection(cluster0-shard-00-01xxxx.mongodb.net:27017[-14]) incomplete read of message header: read tcp 169.254.76.1:36800->x.x.x.x:27017: i/o timeout

connection(cluster0-shard-00-01-xxxxx.mongodb.net:27017[-13]) incomplete read of message header: read tcp 169.254.76.1:42996->x.x.x.x:27017: i/o timeout

"connection() : dial tcp x.x.x.x:27017: i/o timeout
 Failed to create index 
 go.teradata.com/infniti/commons/utils.CheckError
 /opt/teradata/software/go/pkg/mod/go.teradata.com/infniti/commons@v0.0.0-20200425134741-04a2b9ac0a2c/utils/err_utils.go:41
 go.teradata.com/infniti/commons/db.ensureIndex
 /opt/teradata/software/go/pkg/mod/go.teradata.com/infniti/commons@v0.0.0-20200425134741-04a2b9ac0a2c/db/mng_ops.go:128
 go.teradata.com/infniti/commons/db.(*MongoIndex).PerformMongoOperation
 /opt/teradata/software/go/pkg/mod/go.teradata.com/infniti/commons@v0.0.0-20200425134741-04a2b9ac0a2c/db/mng_ops.go:49
 go.teradata.com/infniti/commons/db.(*Connection).DoMongoOperation
 /opt/teradata/software/go/pkg/mod/go.teradata.com/infniti/commons@v0.0.0-20200425134741-04a2b9ac0a2c/db/mng_ops.go:72
 go.teradata.com/infniti/commons/db.(*Connection).mayBeCreateIndex
 /opt/teradata/software/go/pkg/mod/go.teradata.com/infniti/commons@v0.0.0-20200425134741-04a2b9ac0a2c/db/mng_ops.go:180
 runtime.goexit
 /usr/local/opt/go/libexec/src/runtime/asm_amd64.s:1373 "

 

 

Do you think its because my client creation has 30 seconds socket timeout and while re connection the docker is not able to connect within 30 seconds ? causing above error? 30 second is i guess more than enough ill try increasing it tho.

 

clientOptions.SetSocketTimeout(30 * time.Second)
 
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
logger.Debugf(nil, "Initiate DB Connection using uri: %s", uri)
client, err := mongo.Connect(ctx, clientOptions)

 

Apart from this...

We do call Client.Disconnect at some other places. Do think it make sense to have a client reference in pool event so as to know which client this pool connection had an event .

 

Comment by Divjot Arora (Inactive) [ 30/Apr/20 ]

Thanks for the detailed explanation nsaoly@gmail.com. I have a much better understanding of your use case. I have some thoughts/questions that I hope will clarify things further.

The first thing to note is that a Client is not the same as a connection. If you're connected to a 3-node replica set, the Client will maintain a connection pool per-node in the set. Each pool has zero or more connections in it. When you set MaxIdleTime to five minutes, you're saying that any connection in any pool should be closed after five minutes of inactivity. Closing the connection does not disconnect or invalidate the Client, however.

I think it's ok to keep the Client in your cache even if there's no activity on it for >5 minutes. The next time you get a request from that tenant, you'll use the Client to execute a database request. The Client will select a node (e.g. if it's a write operation, the primary will be selected) and then try to get a connection from that node's connection pool. The pool will be empty as all connections were closed after 5 minutes of idle time, so the pool will create a new connection, which will be used for the operation and then put back into the pool. This design allows a Client to account for load spikes followed by periods of inactivity.

The only reason you might want to actually call Client.Disconnect if a connection is closed for being idle is if there's a chance that you'll never get a request from that tenant again. If this is the case, then your current approach makes sense.

Hopefully this clarifies the behavior of the driver. Can you let me know if this impacts your current approach?

Comment by jeet parmar [ 30/Apr/20 ]

@Divjot

Thanks for the explanation.
Below is my usecase
When i get a request from my http handler, I also get a tenant ID in the header.
I then setup a new db connection for each such tenant and i store it in map

{t1 -> *mongo.Client, t2 -> *mongo.Client}

so that next time i get request from same tenant i can directly use the *mongo.Client for that tenant to run queries.

Which means once i serve the request I am not explicitly closing the connection / releasing the *mongo.Client so as to avoid cost of re-connection for the subsequent requests.
However to save overloads of connection when i create a client i set clientOptions.SetMaxConnIdleTime(5* time.Minute)
So if the I dont get request for next 5 minutes, driver will auto disconnect and the pool monitor then responds with connection closed - Idle / stale ..

However when the connection is closed I would like to remove the *mongo.Client from the cache so that when the request comes it will create a new *mongo.Client and store it cache for the usage.

The problem is pool monitor does not give any reference to which *mongo.Client it closed the connection or established. So there is no way I could use to evict the cache or build one on connection success which causes problem for running further queries as it picks up the *mongo.Client whose connection is already closed when it was idle.

Hope i was able to put my thoughts clearly please feel free to ask any questions for better clarity.

Thanks
Jeet

Comment by Divjot Arora (Inactive) [ 29/Apr/20 ]

Hi nsaoly@gmail.com,

This is definitely something we can considering doing as a feature. For your use case, though, I'm not sure that closing a Client makes sense if a connection gets killed. The driver can detect changes such as a node going down/restarting or a primary failover and automatically recover from these. Also, the driver maintains a connection pool for each node in the cluster, so a connection being killed probably means that the node is unreachable, not that the entire cluster is unreachable. Can you explain your use case further so we can provide guidance?

– Divjot

Comment by jeet parmar [ 25/Apr/20 ]

I could take this up task and raise a PR but needs some pointers / direction and a go no go decision .

Comment by jeet parmar [ 25/Apr/20 ]

We are using  mongodb with a golang runtime on AWS lambda.

We are using mongo-go-driver to create multi-tenant connection  (each tenant has seperate db).

For this what we have done , is creating a NewClient() for each tenant and then storing it in global cache for a future client ref. so as to avoid freq connections and disconnections.

tConn, found := connCache[t]
if found {
   logger.Debugf(nil, "Got DB connection from cache for tenant %s", t)
   return tConn
}
mClient := connect(mURI)
connCache[t]= mClient

Further more we are using this cache out of AWS lambda handler as global var to avoid frequent connects in warm context.

We are using PoolMonitor to monitor the connection.

poolMonitor := &event.PoolMonitor{
		Event: func(evt *event.PoolEvent) {
			switch evt.Type {
			case event.GetFailed:
				logger.Debugf(nil, "PoolMonitor: Connection to db failed reason: %v", evt.ConnectionID)
			case event.ConnectionClosed:
				logger.Debugf(nil, "PoolMonitor: Connection to db closed: %s", evt.ConnectionID)
			case event.PoolClosedEvent:
				logger.Debugf(nil, "PoolMonitor: Connection to pool closed: %s", evt.ConnectionID)
			case event.GetSucceeded:
				logger.Debugf(nil, "PoolMonitor: Connection to db succeeded: %s", evt.ConnectionID)
			}
		},
	}

The problem i am facing is I want to remove the client ref from cache if the connection gets killed when idle or stale.

As pool event does not return any reference by which I can identify the client I am unable to do so.

I think it would be great if PoolEvent can return some client reference eg client id .

Generated at Thu Feb 08 08:36:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.