[GODRIVER-433] MaxStalenessSupported sporadically fails on supported versions Created: 28/May/18  Updated: 27/Oct/23  Resolved: 19/Nov/18

Status: Closed
Project: Go Driver
Component/s: Server Selection
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Eric Daniels (Inactive) Assignee: Unassigned
Resolution: Gone away Votes: 0
Labels: Stitch
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Problem/Incident
is caused by GODRIVER-441 Connection Handshake should not call ... Closed

 Description   

I'm creating a client that has a secondary preferred read preference with max staleness set as well. When selecting a server, sometimes the version will not be available in the candidate version (due to possibly not having looked up information yet on the server?) which erroneously causes any operation to fail since the driver thinks the feature is not supported.

I think this a race condition. Say there's just one server in the topology. The condition happens where the topology is partially discovered and during selection we check the feature on the partial server description which will have a missing version. Normally this is okay but it immediately causes an error instead of recognizing that this server is a partial, no candidate has been selected as suitable, and calling RequestImmediateCheck instead.

EDIT: Upon further inspection, I think this is caused by buildInfo being lost on subsequent heartbeats (see: https://github.com/mongodb/mongo-go-driver/blob/master/core/topology/server.go#L381)

This untested patch may work:

diff --git a/core/topology/server.go b/core/topology/server.go
index 0bbc027..3f05a98 100644
--- a/core/topology/server.go
+++ b/core/topology/server.go
@@ -339,6 +339,7 @@ func (s *Server) heartbeat(conn connection.Connection) (description.Server, conn
     var desc description.Server
     var set bool
     var err error
+    currDesc := s.desc.Load().(description.Server)
     ctx := context.Background()
     for i := 1; i <= maxRetry; i++ {
         if conn != nil && conn.Expired() {
@@ -357,7 +358,10 @@ func (s *Server) heartbeat(conn connection.Connection) (description.Server, conn
             opts = append(opts, connection.WithHandshaker(func(h connection.Handshaker) connection.Handshaker {
                 return nil
             }))
-            conn, _, err = connection.New(ctx, s.address, opts...)
+            var connDesc *description.Server
+            if conn, connDesc, err = connection.New(ctx, s.address, opts...); connDesc != nil {
+                currDesc = *connDesc
+            }
             if err != nil {
                 saved = err
                 if conn != nil {
@@ -378,7 +382,11 @@ func (s *Server) heartbeat(conn connection.Connection) (description.Server, conn
         }
         delay := time.Since(now)
 
-        desc = description.NewServer(s.address, isMaster, result.BuildInfo{}).SetAverageRTT(s.updateAverageRTT(delay))
+        desc = description.NewServer(s.address, isMaster, result.BuildInfo{
+            GitVersion:   currDesc.GitVersion,
+            Version:      currDesc.Version.Desc,
+            VersionArray: currDesc.Version.Parts,
+        }).SetAverageRTT(s.updateAverageRTT(delay))
         desc.HeartbeatInterval = s.cfg.heartbeatInterval
         set = true
 



 Comments   
Comment by Jeffrey Yemin [ 29/Oct/18 ]

The driver no longer calls buildinfo so there should not be any partial server descriptions any more.

Generated at Thu Feb 08 08:34:12 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.