[GODRIVER-1301] Data race around topology subscription Created: 05/Sep/19  Updated: 28/Oct/23  Resolved: 25/Sep/19

Status: Closed
Project: Go Driver
Component/s: Core API
Affects Version/s: 1.1.0
Fix Version/s: 1.1.2

Type: Bug Priority: Major - P3
Reporter: David Bartley Assignee: Isabella Siu (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

We have a simple goroutine that subscribes to the topology and then prints some log messages each time it receives an update. One of our tests caught a data race:

03:48:08 Read at 0x00c00153a000 by goroutine 301:
Our code that does "for toplogy := range subscription.C { ... }" and logs details of topology.
 
03:48:08 Previous write at 0x00c00153a000 by goroutine 130:
03:48:08   go.mongodb.org/mongo-driver/x/mongo/driver/description.(*serverSorter).Swap()
03:48:08       external/org_mongodb_go_mongo_driver/x/mongo/driver/description/topology.go:133 +0x149
03:48:08   sort.medianOfThree()
03:48:08       GOROOT/src/sort/sort.go:77 +0x14b
03:48:08   sort.doPivot()
03:48:08       GOROOT/src/sort/sort.go:105 +0x9b
03:48:08   sort.quickSort()
03:48:08       GOROOT/src/sort/sort.go:190 +0xa7
03:48:08   sort.Sort()
03:48:08       GOROOT/src/sort/sort.go:218 +0x8a
03:48:08   go.mongodb.org/mongo-driver/x/mongo/driver/description.DiffTopology()
03:48:08       external/org_mongodb_go_mongo_driver/x/mongo/driver/description/topology.go:48 +0xba
03:48:08   go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*Topology).apply()
03:48:08       external/org_mongodb_go_mongo_driver/x/mongo/driver/topology/topology.go:540 +0x365
03:48:08   go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*Topology).addServer.func1()
03:48:08       external/org_mongodb_go_mongo_driver/x/mongo/driver/topology/topology.go:578 +0xf8
03:48:08   go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*Server).updateDescription()
03:48:08       external/org_mongodb_go_mongo_driver/x/mongo/driver/topology/server.go:431 +0x345
03:48:08   go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*Server).update()
03:48:08       external/org_mongodb_go_mongo_driver/x/mongo/driver/topology/server.go:381 +0x2cc
 
03:48:08 Goroutine 130 (running) created at:
03:48:08   go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*Server).Connect()
03:48:08       external/org_mongodb_go_mongo_driver/x/mongo/driver/topology/server.go:170 +0x160
03:48:08   go.mongodb.org/mongo-driver/x/mongo/driver/topology.ConnectServer()
03:48:08       external/org_mongodb_go_mongo_driver/x/mongo/driver/topology/server.go:113 +0x9d
03:48:08   go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*Topology).addServer()
03:48:08       external/org_mongodb_go_mongo_driver/x/mongo/driver/topology/topology.go:580 +0x181
03:48:08   go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*Topology).apply()
03:48:08       external/org_mongodb_go_mongo_driver/x/mongo/driver/topology/topology.go:554 +0x5e9
03:48:08   go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*Topology).addServer.func1()
03:48:08       external/org_mongodb_go_mongo_driver/x/mongo/driver/topology/topology.go:578 +0xf8
03:48:08   go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*Server).updateDescription()
03:48:08       external/org_mongodb_go_mongo_driver/x/mongo/driver/topology/server.go:431 +0x345
03:48:08   go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*Server).update()
03:48:08       external/org_mongodb_go_mongo_driver/x/mongo/driver/topology/server.go:381 +0x2cc

I think what's happening is that Topology.apply is calling description.DiffTopology, which sorts in-place, on t.fsm.Topology, but it's still the case that a previous topology subscriber (i.e. our code) is running, presumably in response to a previous topology update. It feels like a bug that description.DiffTopology sorts in-place, though I don't think changing that would be sufficient, since fsm.go itself mutates Servers as well.



 Comments   
Comment by Githook User [ 25/Sep/19 ]

Author:

{'username': 'bartle-stripe', 'email': 'bartle@stripe.com', 'name': 'David Bartley'}

Message: Don't mutate inputs in `DiffTopology` or `Topology.DiffHostlist`.

Also guard against a similar race in `compareHosts`.

GODRIVER-1301
Branch: release/1.1
https://github.com/mongodb/mongo-go-driver/commit/84ddf5afea657e155f7eef9ee9328f93197860ed

Comment by Githook User [ 24/Sep/19 ]

Author:

{'name': 'David Bartley', 'username': 'bartle-stripe', 'email': 'bartle@stripe.com'}

Message: Don't mutate inputs in `DiffTopology` or `Topology.DiffHostlist`.

Also guard against a similar race in `compareHosts`.

GODRIVER-1301
Branch: master
https://github.com/mongodb/mongo-go-driver/commit/23eff18e2f3b5a3a112015235fd8ece21c543508

Comment by David Bartley [ 17/Sep/19 ]

https://github.com/mongodb/mongo-go-driver/pull/175

Comment by Isabella Siu (Inactive) [ 16/Sep/19 ]

Hi bartle,

If you have a patch, we are always happy to accept PRs.

Comment by David Bartley [ 12/Sep/19 ]

Would it be helpful for us to provide a patch to fix this, or is someone already working on this?

Comment by David Bartley [ 05/Sep/19 ]

Another observation is that fsm.go mitigates this by having apply do:

    newServers := make([]description.Server, len(f.Servers))
    copy(newServers, f.Servers)
 
    oldMinutes := f.SessionTimeoutMinutes
    f.Topology = description.Topology{
        Kind:    f.Kind,
        Servers: newServers,
    }

Specifically, that'll cause a deep copy of the Topology to be made, so when things like addServer are called, it's only mutating the copy (of the Servers slice), rather than the original Topology, which might still be being read by a subscriber, or anything else that touches Topology.

Relatedly, I was able to trigger a data race (but not reliability) by merely calling topology.Description(). The write side of the race is the same as originally posted, while the read is around for _, s := range topology.Description().Servers {.

Comment by David Bartley [ 05/Sep/19 ]

I suspect DiffHostlist has the same problem; more generally, anything that mutates fsm.Topology outside of fsm.go (and its locks) is presumably racy.

Generated at Thu Feb 08 08:36:07 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.