[GODRIVER-648] Collection usage sometimes leads to CursorNotFound Created: 22/Nov/18  Updated: 27/Oct/23  Resolved: 15/Dec/18

Status: Closed
Project: Go Driver
Component/s: Connections, Server Selection
Affects Version/s: 0.0.18
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Ivan Strelkov Assignee: Jeffrey Yemin
Resolution: Works as Designed Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

After moving from `mgo` to `mongo-go-driver` we started facing CursorNotFound errors when getting documents on one collection:

(CursorNotFound) Cursor not found (namespace: 'arbiter_stage.memberships', id: 2282861465586709065).

Here is the code used to search documents (error is triggered from cursor.Err() check):

// FindAll - simple method for reading multiple documents into array
func (coll Collection) FindAll(ctx context.Context, query interface{}, result interface{}, opts ...*options.FindOptions) error {
	opts = append(opts, options.Find().SetNoCursorTimeout(true))
 
	var err error
 
	// Based on .All() method in go-mgo https://github.com/globalsign/mgo/blob/master/session.go#L4428
	resultv := reflect.ValueOf(result)
	if resultv.Kind() != reflect.Ptr {
		panic("result argument must be a slice address")
	}
 
	slicev := resultv.Elem()
 
	if slicev.Kind() == reflect.Interface {
		slicev = slicev.Elem()
	}
	if slicev.Kind() != reflect.Slice {
		panic("result argument must be a slice address")
	}
 
	cursor, err := coll.Find(ctx, query, opts...)
	if err != nil {
		return err
	}
	defer cursor.Close(ctx)
 
	slicev = slicev.Slice(0, slicev.Cap())
	elemt := slicev.Type().Elem()
	i := 0
	for cursor.Next(ctx) {
		if slicev.Len() == i {
			// If we went beyound slice capacity, we need to reallocate slice
			elemp := reflect.New(elemt)
			err = cursor.Decode(elemp.Interface())
			if err != nil {
				return err
			}
			slicev = reflect.Append(slicev, elemp.Elem())
			slicev = slicev.Slice(0, slicev.Cap())
		} else {
			// In case we inside slice capacity, we can just overwrite value at index i
			err = cursor.Decode(slicev.Index(i).Addr().Interface())
			if err != nil {
				return err
			}
		}
		i++
	}
	// Overwrite original result value with new one
	resultv.Elem().Set(slicev.Slice(0, i))
 
	if err := cursor.Err(); err != nil {
		return err
	}
 
	return nil
}

 

After the error happens it continues appearing and stops after a certain amount of time.

We noticed that error goes away after restarting the docker container with go code.

 

Driver version:

0.0.18 ( dbffb1c211bf96fbd721b4bd6911331e5c1886ab )

 

Go version:

go version go1.11 linux/amd64

mongos version:

mongos version v3.6.9
git version: 167861a164723168adfaaa866f310cb94010428f
OpenSSL version: OpenSSL 1.1.0f  25 May 2017
allocator: tcmalloc
modules: none
build environment:
    distmod: debian92
    distarch: x86_64
    target_arch: x86_64



 Comments   
Comment by Jeffrey Yemin [ 30/Nov/18 ]

I suspect the issue is that in this setup you are essentially using DNS as a load balancer, and the MongoDB wire protocol only supports load balancers that are configured to use "sticky sessions", which essentially means that for a given client IP every connection from that IP will connect to the same mongos server. Using DNS in this way doesn't fit that definition.

I'm not exactly sure why this configuration works with mgo, but it will not work with any of the MongoDB-supported drivers. As a workaround, you will have to stop using multiple A records on a hostname and instead list each of the mongos servers explicitly in the connection string.

Comment by Ivan Strelkov [ 30/Nov/18 ]

I see that the issue type was changed to question. Likely it means we did something wrong.

Could you please tell us what was wrong on our side?

Comment by Ivan Strelkov [ 27/Nov/18 ]

We had two mongos instances with different IP addresses. Connection string was `mongodb://<hostname>`, where <hostname> is resolved via DNS to both instances.

 

Comment by Ian Whalen (Inactive) [ 26/Nov/18 ]

istrel is it possible the connection string you're using is pointing to a load balancer rather than mongos endpoints?

Generated at Thu Feb 08 08:34:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.