[GODRIVER-1750] memory leak in topology.Server Created: 16/Sep/20 Updated: 28/Oct/23 Resolved: 30/Apr/21 |
|
| Status: | Closed |
| Project: | Go Driver |
| Component/s: | Server Discovery and Monitoring |
| Affects Version/s: | 1.4.0 |
| Fix Version/s: | 1.5.2 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Pierre Durand | Assignee: | Divjot Arora (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Linux, Go 1.15 |
||
| Attachments: |
|
| Description |
|
I've noticed that my application is leaking memory in `toology.Server`. It think it's not closing the "context cancel" properly. See the attached screenshot (it's from the DataDog continuous profiling) It started to happen since I upgraded to the Go MongoDB driver v1.4.0. It was perfectly fine with v1.3.x. I'm always using the same MongoDB "client" object, I'm creating it at the application startup, and I continue to use the same object. I'm not creating thousands of client objects. My application is a monitoring tool that fetches the "current op" periodically. It calls `RunCommand()` with `{"currentOp":1, "active": true}`. I suspect it could be related to this, because I don't notice similar issues in my other apps (that are running normal "find" commands). |
| Comments |
| Comment by Githook User [ 30/Apr/21 ] | ||||||||||
|
Author: {'name': 'Divjot Arora', 'email': 'divjot.arora@10gen.com', 'username': 'divjotarora'}Message: | ||||||||||
| Comment by Githook User [ 30/Apr/21 ] | ||||||||||
|
Author: {'name': 'Divjot Arora', 'email': 'divjot.arora@10gen.com', 'username': 'divjotarora'}Message: | ||||||||||
| Comment by Divjot Arora (Inactive) [ 28/Apr/21 ] | ||||||||||
| Comment by Divjot Arora (Inactive) [ 05/Apr/21 ] | ||||||||||
|
I think I've narrowed down the issue - the context may not be cancelled if we fail to create the heartbeat connection. In this case, the next iteration will try to create a new connection and context, which can lead to a context leak. This also explains my inability to reproduce as it would only occur if the target server were unreachable for a long period of time. noobfriend@gmail.com Thank you for your patience and code samples while we've investigated this ticket. I'm going to mark it with fixVersion 1.5.2 and put it into the "Scheduled" state so that we can release a fix in our next patch release. | ||||||||||
| Comment by Divjot Arora (Inactive) [ 25/Mar/21 ] | ||||||||||
|
Thank you for the new code changes. I'll continue to look at this today and next week. | ||||||||||
| Comment by 孟修 N/A [ 25/Mar/21 ] | ||||||||||
|
Hi Divjot Arora, Thanks for the update. I found the change would leads to some side effects about connection handling, too. | ||||||||||
| Comment by Divjot Arora (Inactive) [ 22/Mar/21 ] | ||||||||||
|
I wanted to provide a quick update. I don't think we want to call "ProcessHandshakeError" here as that function is meant for processing errors in pooled connections and errors in monitoring connections are handled separately. In the next few days, I'll dig deeper and try to figure out why calling ProcessHandshakeError reduces the memory pressure and how we can get the same effect without actually calling that function. | ||||||||||
| Comment by Divjot Arora (Inactive) [ 11/Mar/21 ] | ||||||||||
|
Thanks for investigating so thoroughly. I'm going to put this ticket in an "Investigating" state to give myself some time to look over your comments and the commit you linked. Based on that, we'll try to schedule it to be worked on ASAP so we can fix this in our next patch release. – Divjot | ||||||||||
| Comment by 孟修 N/A [ 11/Mar/21 ] | ||||||||||
|
I tested by a simple change and run my server for two days. Change I made: https://github.com/flf2ko/mongo-go-driver/commit/25a25c3f2f75616f4e318e641c41cf49927f6b56 Inuse space:
| ||||||||||
| Comment by 孟修 N/A [ 10/Mar/21 ] | ||||||||||
|
Hi, I have a guess about this issue and still work on verifying it. 1. If `(*Connection).processInitializationError` happened in heartbeat goroutine, it will use `(*Connection).config.errorHandlingCallback` to do cancelling.
https://github.com/mongodb/mongo-go-driver/blob/v1.4.6/x/mongo/driver/topology/connection.go#L131
2. In this case, `(*Connection).config` is derived from `(*Server).createConnection`, which is without `withErrorHandlingCallback(s.ProcessHandshakeError)` ConnectionOption. Note that in pool case, it defines the callback in https://github.com/mongodb/mongo-go-driver/blob/v1.4.6/x/mongo/driver/topology/server.go#L178 . But in heartbeat case, this ConnectionOption is not included in `(*Server).cfg`
https://github.com/mongodb/mongo-go-driver/blob/v1.4.6/x/mongo/driver/topology/server.go#L579
3. Therefore, heartbeatCtxCancel won't be called when `processInitializationError` is happened
https://github.com/mongodb/mongo-go-driver/blob/v1.4.6/x/mongo/driver/topology/server.go#L586-L590
| ||||||||||
| Comment by Backlog - Core Eng Program Management Team [ 09/Mar/21 ] | ||||||||||
|
There hasn't been any recent activity on this ticket, so we're resolving it. Thanks for reaching out! Please feel free to comment on this if you're able to provide more information. | ||||||||||
| Comment by 舜元 施 [ 23/Feb/21 ] | ||||||||||
|
yes, it's a long-running application with slow memory leak. We will keep investigating the issue. We already found that it would only happen when app connects to a replica set cluster which across different data center. (No issue if connect to local DB cluster) | ||||||||||
| Comment by Divjot Arora (Inactive) [ 22/Feb/21 ] | ||||||||||
|
Thanks for the attached memory profile. Can you provide more information on how this profile was taken? Is this a snapshot from a long-running application? Additionally, is it possible for you to provide a Go program that we can run which illustrates the memory issue? – Divjot | ||||||||||
| Comment by 舜元 施 [ 22/Feb/21 ] | ||||||||||
|
Hi, I also encounter this issue from profiling in version v1.4.x
| ||||||||||
| Comment by Esha Bhargava [ 12/Oct/20 ] | ||||||||||
|
There hasn't been any recent activity on this ticket, so we're resolving it. Thanks for reaching out! Please comment on this ticket if you can provide the requested information. | ||||||||||
| Comment by Divjot Arora (Inactive) [ 22/Sep/20 ] | ||||||||||
|
Hi pierrre, I'm trying to look into this, but I haven't been able to reproduce it so far. v1.4.0 of the driver did introduce a new heartbeating protocol that's used by the driver's internal monitoring routine. The implementation of that protocol uses contexts and cancellation functions to signal some state changes. However, I believe the cancellation function is always called for the created contexts, so there should not be a memory leak. In your testing, do you see memory constantly increase, or is it that your application is using more memory than it did before but it isn't increasing as the application runs? Also, do you have a small reproducible example that we could run against a MongoDB server to debug this issue? – Divjot |