[CSHARP-775] Tailable cursor blocks forever when connection is lost Created: 15/Jul/13 Updated: 19/Oct/15 Resolved: 21/Oct/13 |
|
| Status: | Closed |
| Project: | C# Driver |
| Component/s: | None |
| Affects Version/s: | 1.8.1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Flavien | Assignee: | Unassigned |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | Windows, azure | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Mongo 2.4.3 |
||
| Issue Links: |
|
||||||||
| Description |
|
I'm using the MongoDB C# driver 1.8.1.20 with Mongo 2.4.3. I use the following infinite loop to poll new messages from a capped collection and process them as they come (with a tailable cursor and await data). It works for the most part, but in production, it seems that from time to time the call to enumerator.MoveNext() blocks and never returns. This causes the loop to stall, and my application no longer receives updates. It seems to be happening when the connection is closed unexpectedly (in particular, during a VIP Swap on Windows Azure).
The GetCursor function does this:
Either the disconnection should be detected and an exception be thrown, or there should be a timeout exception. The call should not block forever. |
| Comments |
| Comment by Brian Anderton [ 19/Oct/15 ] |
|
Yes, same issue for same reason (Azure VIP swap causes tailable cursor to hang forever). It would seem that you could wrap the async read call with a manual timer which kills the socket when the specified timeout is reached. I created a separate issue with this suggestion: |
| Comment by Craig Wilson [ 19/Oct/15 ] |
|
Hi Brian, Asynchronous sockets do not obey read and write timeouts. We still set these settings, but they simply don't apply. I assume that you are encountering the same issue for the same reasons? If that is the case, I don't believe there is anything we can do about it. The driver's connection is in a state where we have pushed a query to the server and are sitting on the socket waiting for bytes to show up. In this case, nothing shows up and we cannot know why. I would be happy to be wrong about this, so if you know of something, please let me know. We are adding a sync stack into 2.2 (probably) which will begin obeying the socket read and write timeouts again. I think you'll just need to wait until then. Craig |
| Comment by Brian Anderton [ 19/Oct/15 ] |
|
This workaround no longer works in the 2.x async driver, as the SocketTimeout merely sets the NetworkStream ReadTimout property which is ignored for Async socket operations. It appears that there is a manual timer applied to async connects, but not reads. |
| Comment by Craig Wilson [ 21/Oct/13 ] |
|
Hi Flavien, Thanks for suggestion. We discussed the issue and have decided to leave the SocketTimeout alone. 0 is the operating system default and we prefer to leave it that way. Thanks again, |
| Comment by Flavien [ 23/Jul/13 ] |
|
Yes, that fixed the problem, thanks a lot. I would suggest changing the default SocketTimeout to a non-zero value as it may cause hard to debug bugs for people using the defaults (like situations like this). |
| Comment by Flavien [ 19/Jul/13 ] |
|
I will definitely try that and let you know if that fixes the problem. |
| Comment by Craig Wilson [ 19/Jul/13 ] |
|
It that is the case, then there really isn't much we can do about it... We are basically sitting on the connection waiting for data to show up and as long as the connection is open, we have no reason to think anything else. MongoClientSettings has a SocketTimeout property you can set. By default, it uses the OS defaults which, in windows, is infinite (I believe). You could attempt to set that low (30 seconds, 60 seconds) and see if that fixes your issue. |
| Comment by Flavien [ 19/Jul/13 ] |
|
The IP of the load balancer in front of the machine changes IP, so I think it simply causes the TCP connections to time out. |
| Comment by Craig Wilson [ 17/Jul/13 ] |
|
Ok, that's helpful. Do you happen to know if it closes the connections or just kinda leaves them open and tries to do some sort of routing? |
| Comment by Flavien [ 17/Jul/13 ] |
|
Thanks. I am not sure if there is any easier way to repro this, but when doing a VIP swap on Windows Azure, this reproes 100% of the time (I think the load balancer does something funky with the open connections during a VIP swap). |
| Comment by Craig Wilson [ 16/Jul/13 ] |
|
Hi Flavien, |