[GODRIVER-1301] Data race around topology subscription Created: 05/Sep/19 Updated: 28/Oct/23 Resolved: 25/Sep/19 |
|
| Status: | Closed |
| Project: | Go Driver |
| Component/s: | Core API |
| Affects Version/s: | 1.1.0 |
| Fix Version/s: | 1.1.2 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | David Bartley | Assignee: | Isabella Siu (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
|
We have a simple goroutine that subscribes to the topology and then prints some log messages each time it receives an update. One of our tests caught a data race:
I think what's happening is that Topology.apply is calling description.DiffTopology, which sorts in-place, on t.fsm.Topology, but it's still the case that a previous topology subscriber (i.e. our code) is running, presumably in response to a previous topology update. It feels like a bug that description.DiffTopology sorts in-place, though I don't think changing that would be sufficient, since fsm.go itself mutates Servers as well. |
| Comments |
| Comment by Githook User [ 25/Sep/19 ] | ||||||||
|
Author: {'username': 'bartle-stripe', 'email': 'bartle@stripe.com', 'name': 'David Bartley'}Message: Don't mutate inputs in `DiffTopology` or `Topology.DiffHostlist`. Also guard against a similar race in `compareHosts`.
| ||||||||
| Comment by Githook User [ 24/Sep/19 ] | ||||||||
|
Author: {'name': 'David Bartley', 'username': 'bartle-stripe', 'email': 'bartle@stripe.com'}Message: Don't mutate inputs in `DiffTopology` or `Topology.DiffHostlist`. Also guard against a similar race in `compareHosts`.
| ||||||||
| Comment by David Bartley [ 17/Sep/19 ] | ||||||||
| Comment by Isabella Siu (Inactive) [ 16/Sep/19 ] | ||||||||
|
Hi bartle, If you have a patch, we are always happy to accept PRs. | ||||||||
| Comment by David Bartley [ 12/Sep/19 ] | ||||||||
|
Would it be helpful for us to provide a patch to fix this, or is someone already working on this? | ||||||||
| Comment by David Bartley [ 05/Sep/19 ] | ||||||||
|
Another observation is that fsm.go mitigates this by having apply do:
Specifically, that'll cause a deep copy of the Topology to be made, so when things like addServer are called, it's only mutating the copy (of the Servers slice), rather than the original Topology, which might still be being read by a subscriber, or anything else that touches Topology. Relatedly, I was able to trigger a data race (but not reliability) by merely calling topology.Description(). The write side of the race is the same as originally posted, while the read is around for _, s := range topology.Description().Servers {. | ||||||||
| Comment by David Bartley [ 05/Sep/19 ] | ||||||||
|
I suspect DiffHostlist has the same problem; more generally, anything that mutates fsm.Topology outside of fsm.go (and its locks) is presumably racy. |