[GODRIVER-2109] Race condition in staleness checks if connection has not finished connecting Created: 28/Jul/21  Updated: 28/Oct/23  Resolved: 02/Sep/21

Status: Closed
Project: Go Driver
Component/s: None
Affects Version/s: 1.7.0
Fix Version/s: 1.7.2, 1.6.2

Type: Bug Priority: Critical - P2
Reporter: Divjot Arora (Inactive) Assignee: Matt Dale
Resolution: Fixed Votes: 2
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Duplicate
is duplicated by GODRIVER-2135 Race condition in connection pool coo... Closed
Related
related to GODRIVER-2151 Retract Go Driver v1.6.0-v1.6.1 and v... Closed
Cloud Backport: Needed

 Description   

We're seeing a race condition in tests that set MinPoolSize when upgrading to 1.7.0. Relevant pieces of the race detector output:

WARNING: DATA RACE
 [2021/07/28 18:53:25.035]        | Write at 0x00c000481300 by goroutine 66:
 [2021/07/28 18:53:25.035]        |   go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*connection).connect()
 [2021/07/28 18:53:25.035]        |       /data/mci/d42eeec3f9680fb6eb9086010f77b346/src/github.com/10gen/pkg/mod/go.mongodb.org/mongo-driver@v1.7.0/x/mongo/driver/topology/connection.go:231 +0x106b
 [2021/07/28 18:53:25.035]        |
 [2021/07/28 18:53:25.035]        | Previous read at 0x00c000481300 by goroutine 115:
 [2021/07/28 18:53:25.035]        |   go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*pool).stale()
 [2021/07/28 18:53:25.035]        |       /data/mci/d42eeec3f9680fb6eb9086010f77b346/src/github.com/10gen/pkg/mod/go.mongodb.org/mongo-driver@v1.7.0/x/mongo/driver/topology/pool.go:199 +0x235
 [2021/07/28 18:53:25.035]        |   go.mongodb.org/mongo-driver/x/mongo/driver/topology.connectionExpiredFunc()
 [2021/07/28 18:53:25.035]        |       /data/mci/d42eeec3f9680fb6eb9086010f77b346/src/github.com/10gen/pkg/mod/go.mongodb.org/mongo-driver@v1.7.0/x/mongo/driver/topology/pool.go:97 +0x1e5
 [2021/07/28 18:53:25.035]        |   go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*resourcePool).Get()
 [2021/07/28 18:53:25.035]        |       /data/mci/d42eeec3f9680fb6eb9086010f77b346/src/github.com/10gen/pkg/mod/go.mongodb.org/mongo-driver@v1.7.0/x/mongo/driver/topology/resource_pool.go:125 +0x20c
 [2021/07/28 18:53:25.035]        |   go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*pool).get()
 [2021/07/28 18:53:25.035]        |       /data/mci/d42eeec3f9680fb6eb9086010f77b346/src/github.com/10gen/pkg/mod/go.mongodb.org/mongo-driver@v1.7.0/x/mongo/driver/topology/pool.go:388 +0x3f9
 [2021/07/28 18:53:25.035]        |   go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*Server).Connection()
 [2021/07/28 18:53:25.035]        |       /data/mci/d42eeec3f9680fb6eb9086010f77b346/src/github.com/10gen/pkg/mod/go.mongodb.org/mongo-driver@v1.7.0/x/mongo/driver/topology/server.go:266 +0xf4
 [2021/07/28 18:53:25.035]        |   go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*SelectedServer).Connection()
 [2021/07/28 18:53:25.035]        |       <autogenerated>:1 +0x78
 [2021/07/28 18:53:25.035]        |   go.mongodb.org/mongo-driver/x/mongo/driver.Operation.getServerAndConnection()
 [2021/07/28 18:53:25.035]        |       /data/mci/d42eeec3f9680fb6eb9086010f77b346/src/github.com/10gen/pkg/mod/go.mongodb.org/mongo-driver@v1.7.0/x/mongo/driver/operation.go:246 +0x113
 [2021/07/28 18:53:25.035]        |   go.mongodb.org/mongo-driver/x/mongo/driver.Operation.Execute()
 [2021/07/28 18:53:25.035]        |       /data/mci/d42eeec3f9680fb6eb9086010f77b346/src/github.com/10gen/pkg/mod/go.mongodb.org/mongo-driver@v1.7.0/x/mongo/driver/operation.go:301 +0x117

My understanding is that a connection resource will be added to the resource pool in the foreground while the actual establishment is done in the background. When resourcePool#Get is called, it may iterate over this resource and check if it's expired even though it hasn't finished connecting. This can probably be mitigated by a mutex/atomic, but it brings up a larger question about whether we should even be checking connections for expiration if we know they're not done connecting yet.



 Comments   
Comment by Matt Dale [ 07/Sep/21 ]

james@getadmiral.com thanks for the retraction suggestion! I forgot to include that in the v1.7.2 release, but I've created GODRIVER-2151 to track retracting the affected Go module versions as part of the v1.7.3 release.

Comment by Matt Dale [ 02/Sep/21 ]

Fix for this is available in Go Driver v1.6.2 and v1.7.2 releases.

Comment by Matt Dale [ 02/Sep/21 ]

Does this need a backport for the cloud-1.7.1 branch?

Comment by Githook User [ 01/Sep/21 ]

Author:

{'name': 'Matt Dale', 'email': '9760375+matthewdale@users.noreply.github.com', 'username': 'matthewdale'}

Message: GODRIVER-2109 Prevent a data race between connecting and checking out a connection from the resourcePool. (#728)
Branch: release/1.6
https://github.com/mongodb/mongo-go-driver/commit/dca4f1d7a1aa0b7b0b7d09b42489a40481df3004

Comment by Githook User [ 01/Sep/21 ]

Author:

{'name': 'Matt Dale', 'email': '9760375+matthewdale@users.noreply.github.com', 'username': 'matthewdale'}

Message: GODRIVER-2109 Prevent a data race between connecting and checking out a connection from the resourcePool. (#728)
Branch: release/1.7
https://github.com/mongodb/mongo-go-driver/commit/76413851e872c9969d338e472e14d2609e46bbc6

Comment by Matt Dale [ 01/Sep/21 ]

Fix PR #728 is merged. We're planning to do an expedited release, so it should be available sometime this week. I'm starting the release process now and will update this ticket when the release is ready.

Comment by Githook User [ 01/Sep/21 ]

Author:

{'name': 'Matt Dale', 'email': '9760375+matthewdale@users.noreply.github.com', 'username': 'matthewdale'}

Message: GODRIVER-2109 Prevent a data race between connecting and checking out a connection from the resourcePool. (#728)
Branch: master
https://github.com/mongodb/mongo-go-driver/commit/33fac989d3a3f042cd94b5aa3400accc0fac04a3

Comment by Peter Smith [ 30/Aug/21 ]

Amazing, thanks for the update matt.dale 🙏

Comment by Matt Dale [ 30/Aug/21 ]

adinoyi.omuya and peter.smith, this fix is currently scheduled for release with Go Driver v1.7.2 on Tuesday, Sept 7. I'm starting the implementation today, so I'll post any timeline updates to this ticket.

Comment by Peter Smith [ 30/Aug/21 ]

Hi matt.dale 👋  We have a customer in POC right now affected by this Go Driver race condition, in our last sync call, I said I 'd get them an update on the resolution progress. 

Would you be able to give me something I can share to keep them in the loop?

Really appreciate anything 🙏

Peter

Comment by James Hartig [ 27/Aug/21 ]

Can you retract the affected versions so people don't unexpectedly update to a newer release and run into this? We had no idea about this until we updated our modules and now were seeing these race complaints in all of our repos.

Comment by Adinoyi Omuya [ 23/Aug/21 ]

This is impacting completion of some additional work on the ADL side - e.g. MHOUSE-3261 - do y'all have a rough estimate for when this might be fixed?

Comment by Alexej Kubarev [ 17/Aug/21 ]

Same data race arises on the first operation on connection in a pool, even with minPoolSize=1 (so basically single connection that receives an operation) with v1.7.1 driver. 

While the triggering methods are different (e.g CreateIndex) the race condition happens in the same place. So guards around that should be put in place.

Reproduction steps

1. Use a simple application, such as this:

 

package main
 
import (
   "context"
 
   "go.mongodb.org/mongo-driver/mongo"
   "go.mongodb.org/mongo-driver/mongo/options"
   "go.mongodb.org/mongo-driver/mongo/readpref"
 
   "time"
)
 
func main() {
   ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
   defer cancel()
 
   client, err := mongo.Connect(ctx, options.Client().ApplyURI("mongodb://localhost:27017/?minPoolSize=5"))
   if err != nil {
      panic(err)
   }
 
   err = client.Ping(ctx, readpref.Primary())
   if err != nil {
      panic(err)
   }
}

2. Run with go run -race main.go

3. Observe data race printout due to the first issues command (Ping in this case)

==================
WARNING: DATA RACE
Read at 0x00c0002a6e00 by main goroutine:
  go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*pool).stale()
      /home/alexej/go/pkg/mod/go.mongodb.org/mongo-driver@v1.7.1/x/mongo/driver/topology/pool.go:199 +0x235
  go.mongodb.org/mongo-driver/x/mongo/driver/topology.connectionExpiredFunc()
      /home/alexej/go/pkg/mod/go.mongodb.org/mongo-driver@v1.7.1/x/mongo/driver/topology/pool.go:97 +0x1e5
  go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*resourcePool).Get()
      /home/alexej/go/pkg/mod/go.mongodb.org/mongo-driver@v1.7.1/x/mongo/driver/topology/resource_pool.go:125 +0x20c
  go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*pool).get()
      /home/alexej/go/pkg/mod/go.mongodb.org/mongo-driver@v1.7.1/x/mongo/driver/topology/pool.go:388 +0x3f9
  go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*Server).Connection()
      /home/alexej/go/pkg/mod/go.mongodb.org/mongo-driver@v1.7.1/x/mongo/driver/topology/server.go:268 +0xf4
  go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*SelectedServer).Connection()
      <autogenerated>:1 +0x78
  go.mongodb.org/mongo-driver/x/mongo/driver.Operation.getServerAndConnection()
      /home/alexej/go/pkg/mod/go.mongodb.org/mongo-driver@v1.7.1/x/mongo/driver/operation.go:246 +0x113
  go.mongodb.org/mongo-driver/x/mongo/driver.Operation.Execute()
      /home/alexej/go/pkg/mod/go.mongodb.org/mongo-driver@v1.7.1/x/mongo/driver/operation.go:301 +0x117
  go.mongodb.org/mongo-driver/x/mongo/driver/operation.(*Command).Execute()
      /home/alexej/go/pkg/mod/go.mongodb.org/mongo-driver@v1.7.1/x/mongo/driver/operation/command.go:104 +0x34e
  go.mongodb.org/mongo-driver/mongo.(*Database).RunCommand()
      /home/alexej/go/pkg/mod/go.mongodb.org/mongo-driver@v1.7.1/mongo/database.go:203 +0x284
  go.mongodb.org/mongo-driver/mongo.(*Client).Ping()
      /home/alexej/go/pkg/mod/go.mongodb.org/mongo-driver@v1.7.1/mongo/client.go:265 +0x23a
  main.main()
      /home/alexej/temp/main.go:22 +0x1ca
 
Previous write at 0x00c0002a6e00 by goroutine 14:
  go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*connection).connect()
      /home/alexej/go/pkg/mod/go.mongodb.org/mongo-driver@v1.7.1/x/mongo/driver/topology/connection.go:231 +0x106bGoroutine 14 (finished) created at:
  go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*pool).connectionInitFunc()
      /home/alexej/go/pkg/mod/go.mongodb.org/mongo-driver@v1.7.1/x/mongo/driver/topology/pool.go:134 +0xaf
  go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*pool).connectionInitFunc-fm()
      /home/alexej/go/pkg/mod/go.mongodb.org/mongo-driver@v1.7.1/x/mongo/driver/topology/pool.go:128 +0x4a
  go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*resourcePool).add()
      /home/alexej/go/pkg/mod/go.mongodb.org/mongo-driver@v1.7.1/x/mongo/driver/topology/resource_pool.go:102 +0x1fc
  go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*resourcePool).Maintain()
      /home/alexej/go/pkg/mod/go.mongodb.org/mongo-driver@v1.7.1/x/mongo/driver/topology/resource_pool.go:210 +0x26f
  go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*resourcePool).initialize()
      /home/alexej/go/pkg/mod/go.mongodb.org/mongo-driver@v1.7.1/x/mongo/driver/topology/resource_pool.go:95 +0x124
  go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*pool).connect()
      /home/alexej/go/pkg/mod/go.mongodb.org/mongo-driver@v1.7.1/x/mongo/driver/topology/pool.go:208 +0xaa
  go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*Server).Connect()
      /home/alexej/go/pkg/mod/go.mongodb.org/mongo-driver@v1.7.1/x/mongo/driver/topology/server.go:215 +0x239
  go.mongodb.org/mongo-driver/x/mongo/driver/topology.ConnectServer()
      /home/alexej/go/pkg/mod/go.mongodb.org/mongo-driver@v1.7.1/x/mongo/driver/topology/server.go:136 +0xe9
  go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*Topology).addServer()
      /home/alexej/go/pkg/mod/go.mongodb.org/mongo-driver@v1.7.1/x/mongo/driver/topology/topology.go:686 +0x1a6
  go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*Topology).Connect()
      /home/alexej/go/pkg/mod/go.mongodb.org/mongo-driver@v1.7.1/x/mongo/driver/topology/topology.go:213 +0x10af
  go.mongodb.org/mongo-driver/mongo.(*Client).Connect()
      /home/alexej/go/pkg/mod/go.mongodb.org/mongo-driver@v1.7.1/mongo/client.go:156 +0x6fb
  go.mongodb.org/mongo-driver/mongo.Connect()
      /home/alexej/go/pkg/mod/go.mongodb.org/mongo-driver@v1.7.1/mongo/client.go:106 +0x90
  main.main()
      /home/alexej/temp/main.go:17 +0x16e
==================
Found 1 data race(s)

Generated at Thu Feb 08 08:37:50 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.