[CSHARP-187] BsonBuffer waiting indefinitely on broken connection Created: 31/Mar/11 Updated: 02/Apr/15 Resolved: 01/Apr/11 |
|
| Status: | Closed |
| Project: | C# Driver |
| Component/s: | None |
| Affects Version/s: | 1.0 |
| Fix Version/s: | 1.1 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Aristarkh Zagorodnikov | Assignee: | Robert Stam |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
|
BsonBuffer.LoadFrom contains the following loop: while (bytesPending > 0) { else { localChunkOffset += bytesRead; bytesPending -= bytesRead; }} The "timeout" is a necessity here, since when the connection is gracefully closed from the MongoDB side (by stopping the server), the client connection remains active: This leads to an endless loop that just hangs the client thread when the server is gone. |
| Comments |
| Comment by Aristarkh Zagorodnikov [ 05/Apr/11 ] |
|
Thanks for the fix, I'll put it through some heavy testing (multiple occasionally crashing servers in a cluster) in a few days. |
| Comment by Robert Stam [ 05/Apr/11 ] |
|
Pushed a new fix to this bug based on Aristarkh's research. We are now treating a return value of zero from NetworkStream.Read as end of stream with no timeout needed. |
| Comment by Aristarkh Zagorodnikov [ 02/Apr/11 ] |
|
I checked the Mono project docs for NetworkStream.Read() also (http://www.go-mono.com/docs/monodoc.ashx?link=M%3aSystem.Net.Sockets.NetworkStream.Read(System.Byte%5b%5d%2cSystem.Int32%2cSystem.Int32)) So it appears that Mono handles it in the same way I described above: |
| Comment by Aristarkh Zagorodnikov [ 02/Apr/11 ] |
|
I did a small research and it appears that timeout might be unnecessary altogether. The documentation on NetworkStream.Read (http://msdn.microsoft.com/en-us/library/xxst1299.aspx) says: It looks like "If no data is available for reading, the Read method returns 0" part is plainly wrong, check this link: http://social.msdn.microsoft.com/forums/en-US/netfxnetcom/thread/6dbd1467-6a90-43bb-abaa-df5ca5a1cd85 I used .NET Reflector to check out how NetworkStream.Read() is implemented: catch (Exception exception) throw new IOException(SR.GetString("net_io_readfailure", new object[] { exception.Message }), exception); So it appears that it's just a wrapper around Socket.Receive(), and its documentation (http://msdn.microsoft.com/en-us/library/w3xtz6a5.aspx) says: Take note that the Socket.Receive() documentation explicitly mentions that it blocks if it can't read at least some of the data, and returns zero only if the remote end is shut down. So, it appears that there is an old, wrong text in documentation (I believe it wasn't wrong around .NET 1.1, but it changed with 2.0) and Mike Flasko (I think he was System.Net sub-library PM at that time) acknowledges that. It looks like that fix never got in for the last five years though. I agree that this might need additional testing (implementing this wrong might break existing applications completely, I volunteer to help if you need to repro something), but I think that getting immediate disconnection detection (not mentioning the warming sense of "done right") would be a very good thing to have =) |
| Comment by Robert Stam [ 01/Apr/11 ] |
|
Fixed. Added timeout to BsonBuffer.LoadFrom. |
| Comment by Aristarkh Zagorodnikov [ 01/Apr/11 ] |
|
Good to know you got it. I look forward to C# driver being a very robust one since I'm going to put it to heavy use in the near future in an environment where server going down is a norm =) |
| Comment by Robert Stam [ 01/Apr/11 ] |
|
When running the server on Ubuntu I can sometimes reproduce this. Depends on the exact moment that the server is killed. 1. If the server is killed and the next operation on the client is SendMessage an exception is thrown So I guess the timeout is necessary after all. Will put it in. Thanks for reporting this! |
| Comment by Robert Stam [ 01/Apr/11 ] |
|
When I attempt to reproduce this I always get an IOException instead of a hung client. The only difference is that I am running the server on another Windows machine, not Ubuntu. Will setup an Ubuntu environment to test with. Unhandled exception: |
| Comment by Aristarkh Zagorodnikov [ 31/Mar/11 ] |
|
Here I allow one test to finish and then stop the database when the 2nd test is in progress. Thu Mar 31 12:07:32 [initandlisten] MongoDB starting : pid=2013 port=27017 dbpath=/data/mongodb 64-bit Thu Mar 31 12:07:32 [startReplSets] replSet STARTUP2 Thu Mar 31 12:07:35 [initandlisten] connection accepted from 192.168.7.241:55912 #4 Thu Mar 31 12:07:35 [rs Manager] replSet info electSelf 3 Thu Mar 31 12:07:43 [initandlisten] connection accepted from 192.168.7.44:5395 #6 Thu Mar 31 12:07:43 [conn6] getmore test1.items cid:2776051345393985494 getMore: {} bytes:4194290 nreturned:144630 130ms Thu Mar 31 12:07:44 [initandlisten] connection accepted from 192.168.7.242:47303 #7 Thu Mar 31 12:07:46 [conn6] getmore test1.items cid:2776051345393985494 getMore: {} bytes:3976616 nreturned:137124 129ms Thu Mar 31 12:07:50 [conn6] end connection 192.168.7.44:5395 Thu Mar 31 12:07:51 [initandlisten] connection accepted from 192.168.7.44:5396 #8 Thu Mar 31 12:07:52 [conn8] getmore test1.items cid:5322570835678218172 getMore: {} bytes:4194290 nreturned:144630 109ms Thu Mar 31 12:07:53 got kill or ctrl c or hup signal 15 (Terminated), will terminate after current cmd ends Thu Mar 31 12:07:53 [interruptThread] now exiting Thu Mar 31 12:07:53 [conn1] end connection 127.0.0.1:55279 |
| Comment by Aristarkh Zagorodnikov [ 31/Mar/11 ] |
|
Nope, I used graceful stop using the script that comes with the Ubuntu package: sudo service stop mongodb |
| Comment by Robert Stam [ 31/Mar/11 ] |
|
OK. So looks like part of the key to reproducing is to make sure the "items" collection has enough items in it that this loop take many seconds to execute, giving time for you to stop the server. How are you stopping the server? CTL-C? Thanks. |
| Comment by Aristarkh Zagorodnikov [ 31/Mar/11 ] |
|
I still use the same test code as before: catch (Exception ex) { Console.WriteLine(ex.ToString()); } Console.ReadKey(); celestine-3 is running Linux x86-64 (Ubuntu 10.04), with mongodb 1.8.0 release (binary from 10gen ubuntu repo) as a primary (doesn't matter, secondaries fail in the same way, I guess that non-replicated machines would do the same) member of a 3-member replica set. |
| Comment by Robert Stam [ 31/Mar/11 ] |
|
Is there an easy way to reproduce this? |