-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
None
-
Go Drivers
-
Not Needed
-
The Go driver will retry operations that fail due to a Context cancellation or timeout. Those operations can never succeed, so should not be retried. The retry consumes resources, emits confusing events, and can cause an unnecessary connection to be created (a new connection is not always created; depends on the timing between network deadlines and Context goroutines).
Detailed steps to reproduce the problem?
See repro code here.
Below is the output of running the repro code. The extra events that shouldn't happen are bolded.
Initiating MongoDB Client...
Pool Event: pool_event.connection_pool_created
Pool Event: pool_event.connection_pool_ready
Pool Event: pool_event.connection_pool_created
Pool Event: pool_event.connection_pool_ready
Pool Event: pool_event.connection_pool_created
Pool Event: pool_event.connection_pool_ready
Pool Event: pool_event.connection_check_out_started
Pool Event: pool_event.connection_created
Pool Event: pool_event.connection_ready
Pool Event: pool_event.connection_checked_out
Pool Event: pool_event.connection_checked_in
Connected to MongoDB!
Starting fast Mongo Query
Pool Event: pool_event.connection_check_out_started
Pool Event: pool_event.connection_checked_out
Pool Event: pool_event.connection_checked_in
Fast query was successful
Starting slow/timeout Mongo Query with manual context cancel
Pool Event: pool_event.connection_check_out_started
Pool Event: pool_event.connection_checked_out
Manually cancelling slow query
Pool Event: pool_event.connection_checked_in
Pool Event: pool_event.connection_closed
Pool Event: pool_event.connection_check_out_started
Pool Event: pool_event.connection_check_out_failed
Slow Query Manual Cancel Error: connection(localhost:27017[-8]) incomplete read of message header: context canceled
Starting slow/timeout Mongo Query with context deadline exceeded
Pool Event: pool_event.connection_check_out_started
Pool Event: pool_event.connection_created
Pool Event: pool_event.connection_ready
Pool Event: pool_event.connection_checked_out
Pool Event: pool_event.connection_checked_in
Pool Event: pool_event.connection_closed
Pool Event: pool_event.connection_check_out_started
Pool Event: pool_event.connection_check_out_failed
Slow Query Deadline Exceeded Error: connection(localhost:27017[-15]) incomplete read of message header: context deadline exceeded
Closing mongodb client
Pool Event: pool_event.connection_pool_closed
Pool Event: pool_event.connection_pool_closed
Pool Event: pool_event.connection_pool_closed
The root cause seems to be that driver.Error.RetryableRead returns true, even if the underlying error is a Context error. Below is a log with stack trace of a call to the resetForRetry function in Operation.Execute with a "context deadline exceeded" error. The driver shouldn't attempt to retry a Context error because it will never succeed.
resetForRetry(err: connection(localhost:27017[-16]) incomplete read of message header: context deadline exceeded) goroutine 1 [running]: runtime/debug.Stack() /usr/local/go/src/runtime/debug/stack.go:24 +0x64 runtime/debug.PrintStack() /usr/local/go/src/runtime/debug/stack.go:16 +0x1c go.mongodb.org/mongo-driver/x/mongo/driver.Operation.Execute.func1({0x100a44bf8?, 0x14000446540}) /mongo-go-driver/x/mongo/driver/operation.go:425 +0xb4 go.mongodb.org/mongo-driver/x/mongo/driver.Operation.Execute({0x1400041eef0, {0x10085b2a7, 0x14}, {0x100a469b0, 0x1400015a0c0}, 0x1400041ef00, {0x100a44b78, 0x140004164a0}, 0x100d4b600, 0x14000020a20, ...}, ...) /mongo-go-driver/x/mongo/driver/operation.go:775 +0x2680 go.mongodb.org/mongo-driver/x/mongo/driver/operation.(*Find).Execute(0x1400058a000, {0x100a484e0, 0x14000446380}) /mongo-go-driver/x/mongo/driver/operation/find.go:108 +0x18c go.mongodb.org/mongo-driver/mongo.(*Collection).Find(0x1400020e4e0, {0x100a484e0?, 0x14000446380?}, {0x1009ccb20?, 0x140004052c0?}, {0x14000444048, 0x1, 0x1}) /mongo-go-driver/mongo/collection.go:1307 +0xf0c go.mongodb.org/mongo-driver/mongo.(*Collection).FindOne(0x1400020e4e0, {0x100a484e0, 0x14000446380}, {0x1009ccb20, 0x140004052c0}, {0x0, 0x0, 0x11?}) /mongo-go-driver/mongo/collection.go:1364 +0x438 main.main() /main.go:98 +0x7d4
Definition of done: what must be done to consider the task complete?
The exact Go version used, with patch level:
Go 1.21.1
The exact version of the Go driver used:
Stack trace and logs made with v1.11.7, but also able to repro with v1.14.0
Describe how MongoDB is set up. Local vs Hosted, version, topology, load balanced, etc.
Locally hosted 3-node MongoDB 6.0 replica set, with enableTestCommands=1 parameter set.
The operating system and version (e.g. Windows 7, OSX 10.8, ...)
macOS 14.3