[GODRIVER-1234] Deadlock on 1.1.0 release Created: 13/Aug/19  Updated: 28/Oct/23  Resolved: 30/Aug/19

Status: Closed
Project: Go Driver
Component/s: Core API
Affects Version/s: None
Fix Version/s: 1.2.0

Type: Bug Priority: Major - P3
Reporter: David Bartley Assignee: Isabella Siu (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
backported by GODRIVER-1290 Backport "Deadlock on 1.1.0 release" Closed

 Description   

We recently attempted to upgrade to the 1.1.0 release and one of our tests caught a deadlock in Topology.Connect.

goroutine 4 [semacquire]:
sync.runtime_SemacquireMutex(0xc0002461d0, 0x0)
    /usr/local/Cellar/go/1.12.7/libexec/src/runtime/sema.go:71 +0x3d
sync.(*Mutex).Lock(0xc0002461cc)
    /usr/local/Cellar/go/1.12.7/libexec/src/sync/mutex.go:134 +0x109
go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*Topology).apply(0xc000246140, 0x17427a0, 0xc0000ca008, 0xc0000aac40, 0xf, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
    go.mongodb.org/mongo-driver/x/mongo/driver/topology/topology.go:526 +0x54
git.corp.stripe.com/stripe-internal/gocode/vendor/go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*Topology).addServer.func1(0xc0000aac40, 0xf, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc000174820, 0xf, 0xffffff7f, ...)
    go.mongodb.org/mongo-driver/x/mongo/driver/topology/topology.go:578 +0x81
git.corp.stripe.com/stripe-internal/gocode/vendor/go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*Server).updateDescription(0xc0002461e0, 0xc0000aac40, 0xf, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc000174820, 0xf, ...)
    go.mongodb.org/mongo-driver/x/mongo/driver/topology/server.go:431 +0x2d4
git.corp.stripe.com/stripe-internal/gocode/vendor/go.mongodb.org/mongo-driver/x/mongo/driver/topology.NewServer.func1(0xc0000aac40, 0xf, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc000174820, 0xf, 0xffffff7f, ...)
    go.mongodb.org/mongo-driver/x/mongo/driver/topology/server.go:146 +0x70
git.corp.stripe.com/stripe-internal/gocode/vendor/go.mongodb.org/mongo-driver/x/mongo/driver/topology.newConnection(0x17427a0, 0xc0000ca000, 0xc0000aac40, 0xf, 0xc0001f06c0, 0x7, 0xc, 0xdf8475800, 0x8, 0x1601978)
    go.mongodb.org/mongo-driver/x/mongo/driver/topology/connection.go:106 +0x727
git.corp.stripe.com/stripe-internal/gocode/vendor/go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*pool).makeNewConnection(0xc0001f0720, 0x17427a0, 0xc0000ca000, 0xc000001500, 0xc0000ce600, 0x3, 0x104ca01, 0x105f390)
    go.mongodb.org/mongo-driver/x/mongo/driver/topology/pool.go:266 +0x73
git.corp.stripe.com/stripe-internal/gocode/vendor/go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*pool).connectionInitFunc(0xc0001f0720, 0x104db3c, 0x1b0b340)
    go.mongodb.org/mongo-driver/x/mongo/driver/topology/pool.go:124 +0x44
git.corp.stripe.com/stripe-internal/gocode/vendor/go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*resourcePool).add(0xc0001ee370, 0x0)
    go.mongodb.org/mongo-driver/x/mongo/driver/topology/resource_pool.go:99 +0xf4
git.corp.stripe.com/stripe-internal/gocode/vendor/go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*resourcePool).Maintain(0xc0001ee370)
    go.mongodb.org/mongo-driver/x/mongo/driver/topology/resource_pool.go:176 +0xf8
git.corp.stripe.com/stripe-internal/gocode/vendor/go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*resourcePool).initialize(0xc0001ee370)
    go.mongodb.org/mongo-driver/x/mongo/driver/topology/resource_pool.go:92 +0xb0
git.corp.stripe.com/stripe-internal/gocode/vendor/go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*pool).connect(...)
    go.mongodb.org/mongo-driver/x/mongo/driver/topology/pool.go:190
git.corp.stripe.com/stripe-internal/gocode/vendor/go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*Server).Connect(0xc0002461e0, 0xc000230380, 0xc00022a4e0, 0x3)
    go.mongodb.org/mongo-driver/x/mongo/driver/topology/server.go:172 +0x16f
git.corp.stripe.com/stripe-internal/gocode/vendor/go.mongodb.org/mongo-driver/x/mongo/driver/topology.ConnectServer(0xc0000aac40, 0xf, 0xc000230380, 0xc00022a4e0, 0x3, 0x4, 0x0, 0x1, 0xc0002aa000)
    go.mongodb.org/mongo-driver/x/mongo/driver/topology/server.go:113 +0x86
git.corp.stripe.com/stripe-internal/gocode/vendor/go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*Topology).addServer(0xc000246140, 0xc0000aac40, 0xf, 0x0, 0x1)
    go.mongodb.org/mongo-driver/x/mongo/driver/topology/topology.go:580 +0xd2
git.corp.stripe.com/stripe-internal/gocode/vendor/go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*Topology).Connect(0xc000246140, 0x1, 0x1)
    go.mongodb.org/mongo-driver/x/mongo/driver/topology/topology.go:138 +0x209

It looks like Topology.Connect takes out a lock on Topology.serversLock and calls Topology.addServer, but then Topology.addServer is recursively called, and calls Topology.apply, which also takes a lock on Topology.serversLock.



 Comments   
Comment by Githook User [ 03/Sep/19 ]

Author:

{'username': 'iwysiu', 'email': 'isabella.siu@10gen.com', 'name': 'iwysiu'}

Message: GODRIVER-1234 deadlock with minPoolSize

Change-Id: I478a49ae07c8106c88db520f42b97d27c1b2448d
Branch: release/1.1
https://github.com/mongodb/mongo-go-driver/commit/a08c08983d5fe1696e780ac332eaf5e74b7e9408

Comment by Githook User [ 30/Aug/19 ]

Author:

{'name': 'iwysiu', 'username': 'iwysiu', 'email': 'isabella.siu@10gen.com'}

Message: GODRIVER-1234 deadlock with minPoolSize

Change-Id: I478a49ae07c8106c88db520f42b97d27c1b2448d
Branch: master
https://github.com/mongodb/mongo-go-driver/commit/dc939a3122e69ea68f64cf98b5487c47ae1af2aa

Comment by Isabella Siu (Inactive) [ 28/Aug/19 ]

code review url: https://review.gerrithub.io/c/mongodb/mongo-go-driver/+/466625

Comment by David Bartley [ 18/Aug/19 ]

Er, more accurately, that commit (94c698b5b1ee056e5d9995e7d55df59ba62213be) changed things so that MinPoolSize was actually honoured; I believe prior to that commit it was just ignored.

Comment by David Bartley [ 18/Aug/19 ]

Ah, figured it out. This is caused by the new MinPoolSize feature, which was added in the commit I mentioned. That forces connections to be created as part of the Connect call (in resourcePool.Maintain). Here's a small repro:

package main
 
import (
	"log"
 
	"go.mongodb.org/mongo-driver/x/mongo/driver/connstring"
	"go.mongodb.org/mongo-driver/x/mongo/driver/topology"
)
 
func main() {
	connStr := connstring.ConnString{
		Hosts:          []string{"localhost:27017"},
		MinPoolSize:    10,
		MinPoolSizeSet: true,
	}
	log.Printf("topology.New")
	t, err := topology.New(topology.WithConnString(func(connstring.ConnString) connstring.ConnString { return connStr }))
	if err != nil {
		panic(err)
	}
	log.Printf("Connect")
	err = t.Connect()
	if err != nil {
		panic(err)
	}
	log.Printf("OK!")
}

I'm not sure what the best fix is, but it still seems like the real issue is the coarseness of the serversLock.

Comment by David Bartley [ 14/Aug/19 ]

I haven't tested it outside of tests, but it seems to reliably fail a number of tests that simply start a standalone or replset mongod (and then create and connect to a Topology). I've bisected the bad commit to 94c698b5b1ee056e5d9995e7d55df59ba62213be, though it's not clear to me what in particular from that commit would have started triggering this deadlock; it looks like it's always been possible to trigger it, well before that commit.

Comment by Kristofer Brandow (Inactive) [ 14/Aug/19 ]

Hi bartle,

Is this happening all the time when you try to connect the driver or is it only in one specific test?

--Kris

Generated at Thu Feb 08 08:35:58 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.