[CSHARP-3194] Remove blocking code in StreamExtensionMethods Created: 21/Aug/20  Updated: 24/Sep/20  Resolved: 17/Sep/20

Status: Closed
Project: C# Driver
Component/s: API
Affects Version/s: 2.11.0
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Hans Olav Loftum Assignee: Robert Stam
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Windows, Linux, Mac
.net core 3.1



 Description   

StreamExtensionMethods.WriteAsync() and StreamExtensionMethods.ReadAsync() often block concurrent reads and writes.

This can lead to threadpool starvation, which in turn takes down any application using the driver.

 

The two methods use a System.Threading.Timer for detecting timeout.
The Timer's constructor will block because of a lock on TimerQueue.Instance in TimerQueueTimer:

https://github.com/microsoft/referencesource/blob/master/mscorlib/system/threading/timer.cs#L565

 

Our blocking detector reports these stacktraces:

 

Blocking method has been invoked and blocked, this can lead to threadpool starvation. at System.Threading.TimerQueueTimer.Change(UInt32 dueTime, UInt32 period) at System.Threading.TimerQueueTimer..ctor(TimerCallback timerCallback, Object state, UInt32 dueTime, UInt32 period, Boolean flowExecutionContext) at System.Threading.Timer.TimerSetup(TimerCallback callback, Object state, UInt32 dueTime, UInt32 period, Boolean flowExecutionContext) at System.Threading.Timer..ctor(TimerCallback callback, Object state, TimeSpan dueTime, TimeSpan period) at MongoDB.Driver.Core.Misc.StreamExtensionMethods.WriteAsync(Stream stream, Byte[] buffer, Int32 offset, Int32 count, TimeSpan timeout, CancellationToken cancellationToken)

 

Blocking method has been invoked and blocked, this can lead to threadpool starvation. at System.Threading.TimerQueueTimer.Change(UInt32 dueTime, UInt32 period) at System.Threading.TimerQueueTimer..ctor(TimerCallback timerCallback, Object state, UInt32 dueTime, UInt32 period, Boolean flowExecutionContext) at System.Threading.Timer.TimerSetup(TimerCallback callback, Object state, UInt32 dueTime, UInt32 period, Boolean flowExecutionContext) at System.Threading.Timer..ctor(TimerCallback callback, Object state, TimeSpan dueTime, TimeSpan period) at MongoDB.Driver.Core.Misc.StreamExtensionMethods.ReadAsync(Stream stream, Byte[] buffer, Int32 offset, Int32 count, TimeSpan timeout, CancellationToken cancellationToken)

 

An alternative pattern to handle timeouts (parameters stripped for brevity):

 

public static async Task<int> ReadAsync(..., CancellationToken cancellationToken)

{

  var readTask = DoReadAsync(..., cancellationToken);

  var timeoutTask = Task.Delay(timeout, cancellationToken);

  if (await Task.WhenAny(readTask, timeoutTask) == timeoutTask)

 

{     // Handle cancellation of read (and close stream)     // Perhaps throw exception   }

  else

 

{     return readTask.Result;   }

}



 Comments   
Comment by Hans Olav Loftum [ 24/Sep/20 ]

I appreciate that you have taken the time to investigate this.

And I agree that 0.0000153 seems fast.

 

However, it sounds like the computer the tests are run on is fairly powerful, which isn't the case for most webserver hardware.

The test results don't mention any min/max or median times either.

Besides: If I could choose between blocking for 0.0000153 seconds and 0.000000000 seconds, I would have chosen the latter.

(I don't know how much time it would take to create an additional Task, but that creation would not block other threads.)

 

> this is not the only place we use Timer, would we change them all?

I would not draw any conclusions on this on the MongoDB driver code.

But in general: It is beneficial to avoid locks on static objects as much as possible, in methods that will be run concurrently.

 

> this is not the only place we use the C# lock statement, would we change those also?

Again: I would not draw any conclusions. And in many cases there are good reasons for using locks.

 

We do see thread blocks in other parts of the driver code also, e.g.

MongoDB.Driver.Core.ConnectionPools.ExclusiveConnectionPool.ListConnectionHolder.Acquire()

but I have not yet been able to look enough into it to see if this actually can be avoided, and how it possibly could be avoided.

 

 

 

 

 

Again: Thank you for your time investigating this.

I would like to emphasize that my goal is to contribute to make the mongodb driver better, and not to criticize neither code nor individuals.

Comment by Robert Stam [ 15/Sep/20 ]

I think this is just a false positive from your blocking detector tool.

I created a test application to measure whether blocking is actually a problem when creating new Timers, and found that it is not. The test runs for 30 seconds at a time, each time with a different number of threads. Each thread just creates and dispose Timers as fast as it can. This is a worst case test that creates the worst possible contention on the lock buried inside the Timer class. A real world scenario would have much less contention because it would be doing other work besides just creating and disposing timers.

Here's the output of the test:

Number of threads: 1
Average time to construct a Timer: 2 ticks (0.0000002 seconds) 85,934,019 Timers created
 
Number of threads: 4
Average time to construct a Timer: 4 ticks (0.0000004 seconds) 179,136,402 Timers created
 
Number of threads: 8
Average time to construct a Timer: 9 ticks (0.0000009 seconds) 245,180,856 Timers created
 
Number of threads: 10
Average time to construct a Timer: 12 ticks (0.0000012 seconds) 332,677,333 Timers created
 
Number of threads: 100
Average time to construct a Timer: 47 ticks (0.0000047 seconds) 420,135,177 Timers created
 
Number of threads: 1000
Average time to construct a Timer: 106 ticks (0.0000106 seconds) 441,544,369 Timers created
 
Number of threads: 10000
Average time to construct a Timer: 153 ticks (0.0000153 seconds) 458,379,286 Timers created

 Notes:

  • with only 1 thread there is no lock contention and it is no surprise that this is the test with the fastest timer creation
  • my computer has 8 logical processors so anything beyond 8 threads begins to be affected by thread switching overhead
  • 10,000 threads is a very large number of threads that far exceeds what is reasonable
  • even though 10,000 threads is a very large number of threads resulting in the highest contention of all the scenarios tested and while the average time to construct a Timer is 70x larger than for 1 thread the important thing to note is that the average time to create a new Timer is still a very fast 0.0000153 seconds

Based on this information I don't think this change is justified.

Furthermore:

  • this is not the only place we use Timer, would we change them all?
  • this is not the only place we use the C# lock statement, would we change those also?
  • there are undoubtedly locks in lots of external code we don't control, those we can't change anyway

Thanks for suggesting the change, but I just don't think it is justified.

Comment by Hans Olav Loftum [ 31/Aug/20 ]

Our web application performs at least one read operation per request. Most requests perform multiple reads and writes.

During low load we experience one block per minute or so. During higher load we see blocks every second. Our application also use several other frameworks, which all have some amount of blocking code. All blocking calls, however small, will sum up.

We are trying our best to make our application perform, which includes looking into the frameworks we are using. These calls might not block for a “long” time, but they do happen frequently under load.

And since there are non-blocking alternatives that are easy to implement, we would really appreciate this to be changed.

 

Therefore we kindly ask to have this changed, even though it isolated might not seem like a big deal.

Comment by Robert Stam [ 28/Aug/20 ]

Is this a real problem?

My guess is that Timer rarely blocks on the lock and when it does the lock is held for a very short time and there really isn't any significant contention.

Comment by Hans Olav Loftum [ 21/Aug/20 ]

Not sure if `API` was the correct component for this issue.

Generated at Wed Feb 07 21:44:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.