[GODRIVER-2750] High goroutine count in connection read/write Created: 06/Feb/23  Updated: 01/May/23  Resolved: 01/May/23

Status: Closed
Project: Go Driver
Component/s: None
Affects Version/s: 1.9.1, 1.11.1
Fix Version/s: None

Type: Bug Priority: Unknown
Reporter: Luke Mauldin Assignee: Qingyang Hu
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File mongo_driver_pprof.png    
Documentation Changes Summary:

1. What would you like to communicate to the user about this feature?
2. Would you like the user to see examples of the syntax and/or executable code and its output?
3. Which versions of the driver/connector does this apply to?


 Description   

Summary

Go application experiences behavior where the application runs okay for a while but then over time gradually begins to consume more CPU until 100% of 1 CPU is consumed and the application mostly stops responding.

The application is reading messages from a collection via a tailable cursor.  As the application increases in CPU usage, the number of goroutines increase:

go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*connection).write.func2 N=XX
go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*connection).read.func2 N=XX

N starts out small - under 10 and then increases to over 700 for both write and read.  I have ran the command `db.currentOp(true)` on the Mongo server to ensure connections are not being leaked and most of the time the application only has 3-5 connections.

The application is processing the messages one at a time so it should not require a large number of connections to Mongo.

Please provide the version of the driver. If applicable, please provide the MongoDB server version and topology (standalone, replica set, or sharded cluster).

Mongo version: 5.0.14

Go version: 1.19.5

Reproduced on mongo driver versions 1.9.1 and 1.11.1.

How to Reproduce

Do not have a reproducible test case outside of application.

Additional Background

If problem is in application code, need some guidance on what kind of application code would cause a very high number of connection read/write goroutines but a low number of database connections.



 Comments   
Comment by Qingyang Hu [ 01/May/23 ]

Hello luke.mauldin@l3harris.com, we appreciate your report and the detailed information.

Unfortunately, we neither discovered any significant changes around the suspicious code nor were able to reproduce the high goroutine count. Therefore, we are going to close this ticket. However, please feel free to reopen this one or file a new one if you observe the issue again.

Again, we appreciate your support on MongoDB Go Driver.

Comment by Luke Mauldin [ 12/Feb/23 ]
  1.  This behavior is only triggered in certain application batch processing when a lot of messages are processed by one instance of the application.  Unfortunately I can't say the first mongo driver that started experiencing the issue.  I did testing back to 1.9.1 and I experienced the issue there as well.  We are also using Go 1.19.5, not sure if that matters.  I know previous versions of the application were built with Go 1.17 and 1.18 and they did not exhibit the bug.
  2. I updated the current code to be very close to https://github.com/mongodb/mongo-go-driver/blob/0e5ae89f1baf3d6edca093467ffd1f870b441add/mongo/crud_examples_test.go#L908-L929. The previous code was similar but just used cursor.Next() and did not check for RemainingBatchLength.
Comment by Matt Dale [ 11/Feb/23 ]

Hey luke.mauldin@l3harris.com thanks for the ticket and the update with lots of detail! I'm going to start looking into this early next week. I have a few questions:

  1. Did you start noticing the high CPU usage when you upgraded the Go driver? If so, what is the latest version of the Go driver that doesn't cause the high CPU usage in your application?
  2. Can you share snippets of the code that reads the tailable cursor (i.e. calls Next or TryNext) before and after the code update you mentioned?
Comment by Luke Mauldin [ 07/Feb/23 ]

Documenting additional information.  The original code using tailable cursors was written when Mongo driver v1.0.0 was released and had not been updated since that time.  It was using cursor.Next().  I updated the code based on the example ExampleCursor_RemainingBatchLength which uses cursor.TryNext() and cursor.RemainingBatchLength().  

Using the updated code seems to have resolved the performance problem.  

Should the documentation be updated to strongly recommend the use of cursor.TryNext() when using a TailableAwaitable cursor?

I am also including a pprof screenshot that I captured during a time when the application was using 100% of one CPU and in an almost locked state.

Comment by Luke Mauldin [ 07/Feb/23 ]

Please update the title to:

High goroutine count in connection read/write

Generated at Thu Feb 08 08:39:17 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.