[GODRIVER-1076] Watching a ChangeStream doesn't handle Disconnect/Reconnect Created: 21/May/19  Updated: 27/Oct/23  Resolved: 31/May/19

Status: Closed
Project: Go Driver
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Andreas Schneider Assignee: Divjot Arora (Inactive)
Resolution: Gone away Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

Using the following code

ctx := context.Background()
 
c := h.db.Collection("col")
cs, err := c.Watch(ctx, []bson.M{})
if err != nil {
	log.Println("cannot watch for changes", err)
}
defer cs.Close(ctx)
 
for cs.Next(ctx) {
	doc := bson.M{}
	if err := cs.Decode(&doc); err != nil {
		continue
	}
 
	// do something with doc
}
 
if cs.Err() != nil {
	log.Println("error reading changestream", cs.Err())
}

 
When the MongoDB server goes down or gets restarted, the for loop keeps hanging and doesn't receive any more data or error.
Working with a time-constrained context seems wrong, since I actually want to listen endlessly.
IMHO either the driver should silently handle the reconnect and setup the change stream again, or it should properly signal the lost session/connection by returning from the Next() call with an error, so I have a chance to reestablish the change stream myself.



 Comments   
Comment by Andreas Schneider [ 28/May/19 ]

I guess I have to pull this back. I'm no longer able to reproduce the described behavior on driver version 1.0.2. Either something changed or I had other circumstances. However it now seems to properly return even if the context has no timeout but the server connection times out. That way I can properly reconnect. Also if the server restarts quick enough, the change stream just resumes. So it behaves exactly as I said it should - no idea why it did not before.

Sorry for the trouble.

Comment by Andreas Schneider [ 26/May/19 ]

The events would not be generated by this code. I'm simply listening to everything in the Change Stream. So whenever some other process modifies the collection, I get an event.

The problem is basically what you said: the code will loop until the given context times out.

What if I don't specify a timeout because I want to listen endlessly? The program in question needs to react to changes in the collection, so I listen in an endless loop.

When the connection to MongoDB is interrupted (network issue, server offline, whatever), the method cs.Next(ctx) keeps blocking. That is useful in the regard, that the context should control the lifetime. However due to the reconnect to MongoDB, that call NEVER unblocks, since the change stream apparently is not re-setup after reconnect (new session I guess)?

So IMO either the changestream should be reestablished on reconnect or the cs.Next() call should exit with return value false and an error set to indicate, that the connection was lost. Currently the Go driver offers me no way to deal with reconnects.

Comment by Divjot Arora (Inactive) [ 24/May/19 ]

aksdb Can you provide more information about where the events are being generated? The linked code wouldn't generate any events and the driver's change stream code will loop until the given context times out or an event is returned from the server.

Generated at Thu Feb 08 08:35:37 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.