[CSHARP-1184] Investigate Timeouts Created: 25/Feb/15  Updated: 03/Dec/20  Resolved: 03/Dec/20

Status: Closed
Project: C# Driver
Component/s: None
Affects Version/s: 2.0
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Craig Wilson Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

See http://stackoverflow.com/questions/28723222/timeoutexceptions-from-mongo-c-sharp-driver-with-async-api.

There is a configuration option for raising this higher (MongoClientOptions.OperationTimeout) -> https://github.com/mongodb/mongo-csharp-driver/blob/master/src/MongoDB.Driver/MongoClientSettings.cs#L254

This might just be a case where it used to take this long with 1.x, but there were no timeouts in the 1.x side of things.



 Comments   
Comment by Githook User [ 05/Mar/15 ]

Author:

{u'username': u'rstam', u'name': u'rstam', u'email': u'robert@robertstam.org'}

Message: CSHARP-1184: Added a ReceiveCoordinator to handle coordinating the responsibilities of the cooperating receive methods.
Branch: master
https://github.com/mongodb/mongo-csharp-driver/commit/031ff8e1720ff617e70f586a462a37d39c6a3109

Comment by Githook User [ 05/Mar/15 ]

Author:

{u'username': u'rstam', u'name': u'rstam', u'email': u'robert@robertstam.org'}

Message: CSHARP-1184: Replace outbound queue with a send lock in BinaryConnection.
Branch: master
https://github.com/mongodb/mongo-csharp-driver/commit/46752e822d4138203113812894a3ada5b979ad6e

Comment by Michiel Overeem [ 05/Mar/15 ]

Hi Craig, great! Everything seems to work fine with this branch. I am not able to reproduce the bug in our application any longer. Thanks again for fixing this issue.

Comment by Craig Wilson [ 27/Feb/15 ]

Hi Michiel,

So, I was able to repro with a test application. I made a fix on this branch (https://github.com/craiggwilson/mongo-csharp-driver/tree/fix_timeout) and am no longer able to reproduce. Not saying that this is the end solution, just trying to prove that it's not just our machines in a little test harness. Since you mentioned this was a full blown, real application, would you mind pulling down this branch and testing it?

Thanks,
Craig

Comment by Craig Wilson [ 26/Feb/15 ]

No, no repro needed. Doesn't seem like a race condition either. We are legitimately waiting on data. However, it seems that that data has already arrived but the stream.ReadAsync is still in a waiting state. We are sitting here (https://github.com/mongodb/mongo-csharp-driver/blob/master/src/MongoDB.Driver.Core/Core/Misc/StreamExtensionMethods.cs#L45) but the NetworkStream's DataAvailable is true and there are bytes on the socket, so, we are working through this really, really low-level issue.

Comment by Michiel Overeem [ 26/Feb/15 ]

Cool! So no repro from me needed?

Comment by Craig Wilson [ 26/Feb/15 ]

Yeah, that's what I meant (are you using something small and reproducible or your system). Never the less, I've got a repro currently and we are tracking it down. Thanks so much for the report.

Comment by Michiel Overeem [ 26/Feb/15 ]

Hi Craig, I was afraid of that What do you mean, what are you using? Our application? Or versions of different libraries? It is a commercial piece of software, based on the CQRS pattern. What I am going to do is find a way to make a small reproduction, hopefully that will help.

Comment by Craig Wilson [ 26/Feb/15 ]

Thanks Michael. This sounds exactly like a race condition. They are usually heisenbugs. Once we can reproduce it faithfully, then we'll be able to fix it. What are you using when you see the issue?

Comment by Michiel Overeem [ 26/Feb/15 ]

I traced through the source code and I can now see it hangs on the line
var reply = await connection.ReceiveMessageAsync<TDocument>(message.RequestId, _serializer, _messageEncoderSettings, cancellationToken).ConfigureAwait(false);
in QueryWireProtocol.ExecuteAsync , caused by a FindOperation.

However, when I step through it, the error does not occur, so timing and locking might be an issue here.

Comment by Michiel Overeem [ 26/Feb/15 ]

I do not think that it has to do with slow queries that are suddenly canceled by the timeout. When I raise the setting to 2 minutes, I get the same result. Those queries do not take this long with the current v1 version of the driver. It seems more of a blocking issue, but I have no idea where to begin logging and debugging this.

Generated at Wed Feb 07 21:38:54 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.