[CSHARP-2423] Reduce the amount of work done while holding the lock in server session and connection pools Created: 02/Nov/18  Updated: 08/Jun/23  Resolved: 08/Jun/23

Status: Closed
Project: C# Driver
Component/s: Performance
Affects Version/s: 2.7.0
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Valentin Abalmasov Assignee: Unassigned
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Zip Archive TestCSharp2423.zip     PNG File lock.png     PNG File mongo.png    
Issue Links:
Related
Documentation Changes Summary:

1. What would you like to communicate to the user about this feature?
2. Would you like the user to see examples of the syntax and/or executable code and its output?
3. Which versions of the driver/connector does this apply to?


 Description   

Hello there.
 
We have a weird situation in a production environment.
MongoDB 4.0.2
C# driver 2.7.0 (Legacy)
 
Our software uses currently 64 threads to work with MongoDB and everything is good, but we've migrated to more powerfull software and want to use 128 threads.
When we try to do set amount of threads to 128 the server hangs with 100% CPU consumption.
Using dotTrace we've found that it is AcquireSession and ReleaseSession calls of ConnectionPool.
 
in mongodb connection string we've tried to set maxPoolSize to 500 and it supposed to help, but it doesn't. Whatever we've tried to set in connection string - it is not working as expected and CPU usage is still 100%.
 
It seems like it is sort of problem with connection pooling, but currently we didn't figured out how to fix it.
 
Please, see file attached for dotTrace logs.
 
Any recommendations much appreciated.
 
Thanks,
Valentin



 Comments   
Comment by Robert Stam [ 04/Dec/18 ]

Thank you for your pull request. I've examined it and I don't think we can merge it as is for a few reasons.

One is that you have replaced the List<T> classes with ConcurrentQueue<T> classes, which changes the semantics of which session/connection is acquired next from the pool from LIFO to FIFO. It is important that we use LIFO in both cases because we always want to reuse the most recently used session/connection and let unneeded sessions/connections get old and be discarded. Potentially your pull request could be changed from using ConcurrentQueue<T> to using ConcurrentStack<T> to achieve these semantics.

Another is that to clean out stale sessions/connections we need access to the bottom of the stack as well as the top, and ConcurrentStack<T> does not provide any way to do anything with the stack except to try and pop from the top of the stack.

I tested against your pull request and I can confirm that there was a modest performance improvement due to reduced lock contention. However, I still saw CPU utilization spike to 100% somewhere between 100 and 200 threads, once again with no obvious impact on throughput. So while your pull request does improve performance, it doesn't change CPU utilization spiking to 100% and it changes the semantics of how the pools work in ways that are not desirable.

I have found that we can achieve similar modest performance improvements by simply reducing the amount of work we do while holding the lock, while keeping the implementation essentially the same (and therefore the behavior of the pools the same as well).

We will be code reviewing my proposed changes internally and I will update this ticket once we decide if those changes can be incorporated.

Thanks again for the pull request. Even though we can't merge it it was very helpful in identifying where we could reduce lock contention.

Comment by Robert Stam [ 04/Dec/18 ]

I am able to reproduce CPU utilization going to 100%, but it doesn't actually seem to be a problem. CPU utilization jumps to 100% somewhere between 100 and 200 threads (probably depends on your hardware and test scenario) but throughput remains similar. My theory is that once lock contention reaches a certain level there is some busy waiting in the implementation of the lock statement that manifests itself as 100% CPU utilization. But nothing hangs.

Here's the output of a typical test run of my test program (attached as TestCSharp243.zip) run against the current driver implementation:

1 threads: 17628 ms
2 threads: 8795 ms
3 threads: 6191 ms
4 threads: 4654 ms
5 threads: 3717 ms
6 threads: 3185 ms
7 threads: 2869 ms
8 threads: 2220 ms
9 threads: 1965 ms
10 threads: 1788 ms
20 threads: 982 ms
50 threads: 564 ms
100 threads: 1232 ms
125 threads: 1731 ms
150 threads: 1713 ms
175 threads: 2250 ms
200 threads: 3045 ms
500 threads: 1759 ms
1000 threads: 5007 ms
2000 threads: 4150 ms
5000 threads: 5038 ms
10000 threads: 3343 ms

Note that two threads is twice as fast as one thread, and that eight threads is very close to eight times as fast as one thread. Total elapsed time continues to drop with more threads with the best performance occurring around 50 threads. But even with all the overhead of lock contention and thread context switching 10000 threads is still much faster than a single thread. And it definitely isn't hanging.

That being said, I agree that we should refactor the code to do less work while holding the lock, which will reduce lock contention.

 

Comment by Valentin Abalmasov [ 02/Dec/18 ]

We've managed to fix the issue by ourselves. Please look at pull request

Comment by Valentin Abalmasov [ 29/Nov/18 ]

It seems that the lock(this._lock) is causing that issue when running under 100+ threads environment.

Comment by Valentin Abalmasov [ 29/Nov/18 ]

Hi there. We've successfully migrated to the 2.7.0 driver version (NOT LEGACY), because we've thought that this can be the issue in Legacy driver.

Unfortunately, we've faced the same situation.

Please, look at attached screenshot.

 

BTW, do you have any news regarding that issue. It is very critical for us (

Generated at Wed Feb 07 21:42:31 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.