[CSHARP-408] WaitQueueTimeout default value Created: 11/Mar/12  Updated: 02/Apr/15  Resolved: 15/Mar/12

Status: Closed
Project: C# Driver
Component/s: None
Affects Version/s: 1.3.1
Fix Version/s: 1.4

Type: Improvement Priority: Major - P3
Reporter: Aristarkh Zagorodnikov Assignee: Robert Stam
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Minor Change

 Description   

In our production, when MongoDB server is close to hitting I/O subsystem limits (we have a write-intensive application that makes MongoDB spend much time write-locked, but that's another story), the mean query execution time starts to raise when MongoDB flushes the maps (every 60 secs by default), which is completely expected. This, of course, leads to additional connections being opened, because requests still arrive at a steady pace. If the aforementioned flush takes about a second (yes, I know this isn't a healthy value, but currently we have to stick with this), the connection pool overflows and our backend threads begin getting TimeoutException (hopefully CSHARP-393 would changed that to something more expected), since they're hitting the wait queue timeout.

I wonder why WaitQueueTimeout is so low by default (500ms)? If the connection pool is close to exhaustion because of an intermittent slowdown due to disk flush, LVM snapshotting, OS cache pollution, etc., most of the threads just won't make it in 500ms. If there is no reason for it to be so low, I propose it to become 1/2 of connection timeout, which would translate to 15 seconds.

While I understand that in the case of severe server slowdown this increase would lead to a lot of hanging threads, in typical parallel client scenarios (ASP.NET, WCF services, etc.), these threads would be provided from thread pool that has an effective limit that depends on the amount of processors, so it would not oversaturate the OS with excessive amount of threads and won't exhaust memory with thread stacks. The benefits of increasing are clear – 1-2 second server "lag" would not abort like half of requests (our scenario) just because connection pool was saturated at that time.

I also understand that we can set it on per-database basis in client code, but I still recommend that default should be changed, people with low loads won't notice the change, and people that are close to their servers' capacity would not hit the truckload of TimeoutExceptions in their face =)

Of course, it may be that I miss some point that justifies the default low value, sorry to bother you with this if I really missed something.



 Comments   
Comment by Aristarkh Zagorodnikov [ 13/Mar/12 ]

I think that relatively large timeout that's uniform across different drivers would be very good (not mentioning that 2 minutes is more than enough to sort out our problems).

Comment by Robert Stam [ 13/Mar/12 ]

So the proposed fix is to change the default values to be the same as those used by the Java driver:

WaitQueueMultiple = 5.0
WaitQueueTimeout = 2 minutes

Comment by Robert Stam [ 13/Mar/12 ]

[Corrected]

The default value of WaitQueueTimeout in the Java driver is 2 minutes, and I am told that is working well for most people. We would like the default value to match across drivers. Also, the Java driver has a default WaitQueueMultiplier of 5 instead of 1, which means many more threads can be waiting.

Would a default value of 2 minutes work for you?

Generated at Wed Feb 07 21:36:44 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.