[CSHARP-3194] Remove blocking code in StreamExtensionMethods Created: 21/Aug/20 Updated: 24/Sep/20 Resolved: 17/Sep/20 |
|
| Status: | Closed |
| Project: | C# Driver |
| Component/s: | API |
| Affects Version/s: | 2.11.0 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Hans Olav Loftum | Assignee: | Robert Stam |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Windows, Linux, Mac |
||
| Description |
|
StreamExtensionMethods.WriteAsync() and StreamExtensionMethods.ReadAsync() often block concurrent reads and writes. This can lead to threadpool starvation, which in turn takes down any application using the driver.
The two methods use a System.Threading.Timer for detecting timeout. https://github.com/microsoft/referencesource/blob/master/mscorlib/system/threading/timer.cs#L565
Our blocking detector reports these stacktraces:
Blocking method has been invoked and blocked, this can lead to threadpool starvation. at System.Threading.TimerQueueTimer.Change(UInt32 dueTime, UInt32 period) at System.Threading.TimerQueueTimer..ctor(TimerCallback timerCallback, Object state, UInt32 dueTime, UInt32 period, Boolean flowExecutionContext) at System.Threading.Timer.TimerSetup(TimerCallback callback, Object state, UInt32 dueTime, UInt32 period, Boolean flowExecutionContext) at System.Threading.Timer..ctor(TimerCallback callback, Object state, TimeSpan dueTime, TimeSpan period) at MongoDB.Driver.Core.Misc.StreamExtensionMethods.WriteAsync(Stream stream, Byte[] buffer, Int32 offset, Int32 count, TimeSpan timeout, CancellationToken cancellationToken)
Blocking method has been invoked and blocked, this can lead to threadpool starvation. at System.Threading.TimerQueueTimer.Change(UInt32 dueTime, UInt32 period) at System.Threading.TimerQueueTimer..ctor(TimerCallback timerCallback, Object state, UInt32 dueTime, UInt32 period, Boolean flowExecutionContext) at System.Threading.Timer.TimerSetup(TimerCallback callback, Object state, UInt32 dueTime, UInt32 period, Boolean flowExecutionContext) at System.Threading.Timer..ctor(TimerCallback callback, Object state, TimeSpan dueTime, TimeSpan period) at MongoDB.Driver.Core.Misc.StreamExtensionMethods.ReadAsync(Stream stream, Byte[] buffer, Int32 offset, Int32 count, TimeSpan timeout, CancellationToken cancellationToken)
An alternative pattern to handle timeouts (parameters stripped for brevity):
public static async Task<int> ReadAsync(..., CancellationToken cancellationToken) { var readTask = DoReadAsync(..., cancellationToken); var timeoutTask = Task.Delay(timeout, cancellationToken); if (await Task.WhenAny(readTask, timeoutTask) == timeoutTask) { // Handle cancellation of read (and close stream) // Perhaps throw exception } else { return readTask.Result; } } |
| Comments |
| Comment by Hans Olav Loftum [ 24/Sep/20 ] | ||||||||||||||||||||
|
I appreciate that you have taken the time to investigate this. And I agree that 0.0000153 seems fast.
However, it sounds like the computer the tests are run on is fairly powerful, which isn't the case for most webserver hardware. The test results don't mention any min/max or median times either. Besides: If I could choose between blocking for 0.0000153 seconds and 0.000000000 seconds, I would have chosen the latter. (I don't know how much time it would take to create an additional Task, but that creation would not block other threads.)
> this is not the only place we use Timer, would we change them all? I would not draw any conclusions on this on the MongoDB driver code. But in general: It is beneficial to avoid locks on static objects as much as possible, in methods that will be run concurrently.
> this is not the only place we use the C# lock statement, would we change those also? Again: I would not draw any conclusions. And in many cases there are good reasons for using locks.
We do see thread blocks in other parts of the driver code also, e.g. MongoDB.Driver.Core.ConnectionPools.ExclusiveConnectionPool.ListConnectionHolder.Acquire() but I have not yet been able to look enough into it to see if this actually can be avoided, and how it possibly could be avoided.
Again: Thank you for your time investigating this. I would like to emphasize that my goal is to contribute to make the mongodb driver better, and not to criticize neither code nor individuals. | ||||||||||||||||||||
| Comment by Robert Stam [ 15/Sep/20 ] | ||||||||||||||||||||
|
I think this is just a false positive from your blocking detector tool. I created a test application to measure whether blocking is actually a problem when creating new Timers, and found that it is not. The test runs for 30 seconds at a time, each time with a different number of threads. Each thread just creates and dispose Timers as fast as it can. This is a worst case test that creates the worst possible contention on the lock buried inside the Timer class. A real world scenario would have much less contention because it would be doing other work besides just creating and disposing timers. Here's the output of the test:
Notes:
Based on this information I don't think this change is justified. Furthermore:
Thanks for suggesting the change, but I just don't think it is justified. | ||||||||||||||||||||
| Comment by Hans Olav Loftum [ 31/Aug/20 ] | ||||||||||||||||||||
|
Our web application performs at least one read operation per request. Most requests perform multiple reads and writes. During low load we experience one block per minute or so. During higher load we see blocks every second. Our application also use several other frameworks, which all have some amount of blocking code. All blocking calls, however small, will sum up. We are trying our best to make our application perform, which includes looking into the frameworks we are using. These calls might not block for a “long” time, but they do happen frequently under load. And since there are non-blocking alternatives that are easy to implement, we would really appreciate this to be changed.
Therefore we kindly ask to have this changed, even though it isolated might not seem like a big deal. | ||||||||||||||||||||
| Comment by Robert Stam [ 28/Aug/20 ] | ||||||||||||||||||||
|
Is this a real problem? My guess is that Timer rarely blocks on the lock and when it does the lock is held for a very short time and there really isn't any significant contention. | ||||||||||||||||||||
| Comment by Hans Olav Loftum [ 21/Aug/20 ] | ||||||||||||||||||||
|
Not sure if `API` was the correct component for this issue. |