Uploaded image for project: 'Go Driver'
  1. Go Driver
  2. GODRIVER-3145

Don't retry operations on Context cancel/timeout

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 1.15.0
    • Affects Version/s: None
    • Component/s: None
    • None
    • Go Drivers
    • Not Needed
    • Hide

      1. What would you like to communicate to the user about this feature?
      2. Would you like the user to see examples of the syntax and/or executable code and its output?
      3. Which versions of the driver/connector does this apply to?

      Show
      1. What would you like to communicate to the user about this feature? 2. Would you like the user to see examples of the syntax and/or executable code and its output? 3. Which versions of the driver/connector does this apply to?

      The Go driver will retry operations that fail due to a Context cancellation or timeout. Those operations can never succeed, so should not be retried. The retry consumes resources, emits confusing events, and can cause an unnecessary connection to be created (a new connection is not always created; depends on the timing between network deadlines and Context goroutines).

      Detailed steps to reproduce the problem?

      See repro code here.

      Below is the output of running the repro code. The extra events that shouldn't happen are bolded.

      Initiating MongoDB Client...
      Pool Event: pool_event.connection_pool_created
      Pool Event: pool_event.connection_pool_ready
      Pool Event: pool_event.connection_pool_created
      Pool Event: pool_event.connection_pool_ready
      Pool Event: pool_event.connection_pool_created
      Pool Event: pool_event.connection_pool_ready
      Pool Event: pool_event.connection_check_out_started
      Pool Event: pool_event.connection_created
      Pool Event: pool_event.connection_ready
      Pool Event: pool_event.connection_checked_out
      Pool Event: pool_event.connection_checked_in
      Connected to MongoDB!
      Starting fast Mongo Query
      Pool Event: pool_event.connection_check_out_started
      Pool Event: pool_event.connection_checked_out
      Pool Event: pool_event.connection_checked_in
      Fast query was successful
      Starting slow/timeout Mongo Query with manual context cancel
      Pool Event: pool_event.connection_check_out_started
      Pool Event: pool_event.connection_checked_out
      Manually cancelling slow query
      Pool Event: pool_event.connection_checked_in
      Pool Event: pool_event.connection_closed
      Pool Event: pool_event.connection_check_out_started
      Pool Event: pool_event.connection_check_out_failed
      Slow Query Manual Cancel Error: connection(localhost:27017[-8]) incomplete read of message header: context canceled
      Starting slow/timeout Mongo Query with context deadline exceeded
      Pool Event: pool_event.connection_check_out_started
      Pool Event: pool_event.connection_created
      Pool Event: pool_event.connection_ready
      Pool Event: pool_event.connection_checked_out
      Pool Event: pool_event.connection_checked_in
      Pool Event: pool_event.connection_closed
      Pool Event: pool_event.connection_check_out_started
      Pool Event: pool_event.connection_check_out_failed
      Slow Query Deadline Exceeded Error: connection(localhost:27017[-15]) incomplete read of message header: context deadline exceeded
      Closing mongodb client
      Pool Event: pool_event.connection_pool_closed
      Pool Event: pool_event.connection_pool_closed
      Pool Event: pool_event.connection_pool_closed

      The root cause seems to be that driver.Error.RetryableRead returns true, even if the underlying error is a Context error. Below is a log with stack trace of a call to the resetForRetry function in Operation.Execute with a "context deadline exceeded" error. The driver shouldn't attempt to retry a Context error because it will never succeed.

      resetForRetry(err: connection(localhost:27017[-16]) incomplete read of message header: context deadline exceeded)
      goroutine 1 [running]:
      runtime/debug.Stack()
              /usr/local/go/src/runtime/debug/stack.go:24 +0x64
      runtime/debug.PrintStack()
              /usr/local/go/src/runtime/debug/stack.go:16 +0x1c
      go.mongodb.org/mongo-driver/x/mongo/driver.Operation.Execute.func1({0x100a44bf8?, 0x14000446540})
              /mongo-go-driver/x/mongo/driver/operation.go:425 +0xb4
      go.mongodb.org/mongo-driver/x/mongo/driver.Operation.Execute({0x1400041eef0, {0x10085b2a7, 0x14}, {0x100a469b0, 0x1400015a0c0}, 0x1400041ef00, {0x100a44b78, 0x140004164a0}, 0x100d4b600, 0x14000020a20, ...}, ...)
              /mongo-go-driver/x/mongo/driver/operation.go:775 +0x2680
      go.mongodb.org/mongo-driver/x/mongo/driver/operation.(*Find).Execute(0x1400058a000, {0x100a484e0, 0x14000446380})
              /mongo-go-driver/x/mongo/driver/operation/find.go:108 +0x18c
      go.mongodb.org/mongo-driver/mongo.(*Collection).Find(0x1400020e4e0, {0x100a484e0?, 0x14000446380?}, {0x1009ccb20?, 0x140004052c0?}, {0x14000444048, 0x1, 0x1})
              /mongo-go-driver/mongo/collection.go:1307 +0xf0c
      go.mongodb.org/mongo-driver/mongo.(*Collection).FindOne(0x1400020e4e0, {0x100a484e0, 0x14000446380}, {0x1009ccb20, 0x140004052c0}, {0x0, 0x0, 0x11?})
              /mongo-go-driver/mongo/collection.go:1364 +0x438
      main.main()
              /main.go:98 +0x7d4
      

      Definition of done: what must be done to consider the task complete?

      The exact Go version used, with patch level:

      Go 1.21.1

      The exact version of the Go driver used:

      Stack trace and logs made with v1.11.7, but also able to repro with v1.14.0

      Describe how MongoDB is set up. Local vs Hosted, version, topology, load balanced, etc.

      Locally hosted 3-node MongoDB 6.0 replica set, with enableTestCommands=1 parameter set.

      The operating system and version (e.g. Windows 7, OSX 10.8, ...)

      macOS 14.3

            Assignee:
            matt.dale@mongodb.com Matt Dale
            Reporter:
            matt.dale@mongodb.com Matt Dale
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: