[GODRIVER-2115] CPU/Memory leak in the Go driver Created: 05/Aug/21  Updated: 22/Aug/22  Resolved: 22/Aug/22

Status: Closed
Project: Go Driver
Component/s: None
Affects Version/s: 1.5.1
Fix Version/s: None

Type: Bug Priority: Unknown
Reporter: Siddhant Mutha Assignee: Qingyang Hu
Resolution: Cannot Reproduce Votes: 0
Labels: post-1.8.0
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Screenshot 2021-08-19 at 10.42.48 AM.png     PNG File Screenshot 2021-08-19 at 4.06.09 PM.png     PNG File image-2021-08-05-17-08-29-681.png     PNG File image-2021-08-05-17-17-04-161.png    
Epic Link: Reduce Allocations
Quarter: FY23Q2, FY23Q3

 Description   

Hello

I'm making repeated `Find` calls to mongo db via the mongodb go driver (v1.5.1). I'm noticing that the cpu/mem/network usage keeps increasing over a long period of time. Performing a pprof profiling hinted that it was the `Find` library call that's constantly making heap allocations. So I wrote a barebones program that emulates the behaviour, and noticed that it was indeed the mongo driver's `Find` method that was responsible.

Following is the program: 

package main
 
import (
   "context"
   "fmt"
   "github.com/gorilla/mux"
   "go.mongodb.org/mongo-driver/bson"
   "go.mongodb.org/mongo-driver/mongo"
   "go.mongodb.org/mongo-driver/mongo/options"
   "net/http"
   _ "net/http/pprof"
   "time"
)
func main() {
   m := mux.NewRouter()
   m.PathPrefix("/debug/").Handler(http.DefaultServeMux)
   srv := http.Server{
      Addr: ":8889",
      Handler: m,
   }
 
   go func() {
      err := srv.ListenAndServe()
      if err != nil {
         fmt.Println("unable to start server")
      }
   }()
 
   ctx := context.Background()
   dbUser := "dbUser"
   dbName := "dnName"
   dbPwd := "dbPwd"
   collectionName := "collName"
   hostInfos := "localhost:27017"
 
   connectURL := fmt.Sprintf("%s:%s@%s", dbUser, dbPwd, hostInfos)
   mongoConnectURL := "mongodb://%s"
   mongoURI := fmt.Sprintf(mongoConnectURL, connectURL)
 
   clientOptions := options.Client().ApplyURI(mongoURI)
 
   maxIdleTimeout := 15 * time.Minute
   clientOptions.MaxConnIdleTime = &maxIdleTimeout
 
   client, err := mongo.Connect(ctx, clientOptions)
   if err != nil {
      panic(err)
   }
   defer client.Disconnect(ctx) 
 
   col := *client.Database(dbName).Collection(collectionName)
 
   var limit *int64
   a := int64(100)
   limit = &a
 
   key := bson.M{
      ... // bson document
   }
 
   // using a short duration to speed up the observation
   // in reality it is 10 sec, but the cpu/mem increase is over a much longer time then
   ticker := time.NewTicker(10 * time.Millisecond)
   findOpts := &options.FindOptions{
      Limit: limit,
   }
 
   for {
      select {
      case <-ticker.C:
         fmt.Println("finding...")
         cursor, err := col.Find(context.Background(), key, findOpts)
         if err != nil {
            panic(err)
         }
         err = cursor.Close(ctx)
         if err != nil {
            panic(err)
         }
         fmt.Println("done finding...")
      }
   }
}

 

Notice that the ticker interval is 10ms, whereas in our real program it's 10sec. I've reduced it to speed up the observations.

While this runs, the pprof trace profile shows the following:

The heap allocations above keep increasing at a constant rate. 

The pprof cpu profile show the following:

 

Any idea about this? Is this a known issue, and if so, is there a resolution?



 Comments   
Comment by Siddhant Mutha [ 22/Aug/22 ]

Thanks @Qingyang @Kevin for looking into this. Sure, we can close this ticket.

Comment by Qingyang Hu [ 11/Aug/22 ]

Hello ssmutha@netskope.com, thank you for providing the test code and profiles. After investigating our latest version, we did not see evidence of leaks. We believe the observation is the Golang garbage collection behavior as expected. While we are optimizing the Go driver GC in other tickets, we are going to close this one. Please feel free to leave comments if you have any questions or concerns.

Comment by Kevin Albertson [ 27/Sep/21 ]

Hello ssmutha@netskope.com, thank you for your patience. We will come back to this investigation after we have completed urgent work for an upcoming release.

Comment by Siddhant Mutha [ 19/Aug/21 ]

Tried using newer versions of the library, 1.6.1 and 1.7.1 (latest). I’m still seeing the same behaviour.

v1.6.1:

 

v1.7.1:

Comment by Siddhant Mutha [ 17/Aug/21 ]

Sure, I'll try that. 

Thanks Matt.

Comment by Matt Dale [ 17/Aug/21 ]

Hey ssmutha@netskope.com, thanks for reporting this issue! I haven't had a chance to thoroughly investigate that CPU profile, but I do know we removed some unnecessary BSON marshalling memory allocations in Go driver v1.6.x. Consider upgrading to Go driver v1.6.1 and see if you observe fewer memory allocations.

I'll follow up when I get a chance to look at your CPU profile in more detail.

Comment by Siddhant Mutha [ 16/Aug/21 ]

Any updates on this?

Generated at Thu Feb 08 08:37:51 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.