[CSHARP-2423] Reduce the amount of work done while holding the lock in server session and connection pools Created: 02/Nov/18 Updated: 08/Jun/23 Resolved: 08/Jun/23 |
|
| Status: | Closed |
| Project: | C# Driver |
| Component/s: | Performance |
| Affects Version/s: | 2.7.0 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Valentin Abalmasov | Assignee: | Unassigned |
| Resolution: | Won't Do | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||
| Issue Links: |
|
||||
| Documentation Changes Summary: | 1. What would you like to communicate to the user about this feature? |
||||
| Description |
|
Hello there. |
| Comments |
| Comment by Robert Stam [ 04/Dec/18 ] | ||||||||||||||||||||||
|
Thank you for your pull request. I've examined it and I don't think we can merge it as is for a few reasons. One is that you have replaced the List<T> classes with ConcurrentQueue<T> classes, which changes the semantics of which session/connection is acquired next from the pool from LIFO to FIFO. It is important that we use LIFO in both cases because we always want to reuse the most recently used session/connection and let unneeded sessions/connections get old and be discarded. Potentially your pull request could be changed from using ConcurrentQueue<T> to using ConcurrentStack<T> to achieve these semantics. Another is that to clean out stale sessions/connections we need access to the bottom of the stack as well as the top, and ConcurrentStack<T> does not provide any way to do anything with the stack except to try and pop from the top of the stack. I tested against your pull request and I can confirm that there was a modest performance improvement due to reduced lock contention. However, I still saw CPU utilization spike to 100% somewhere between 100 and 200 threads, once again with no obvious impact on throughput. So while your pull request does improve performance, it doesn't change CPU utilization spiking to 100% and it changes the semantics of how the pools work in ways that are not desirable. I have found that we can achieve similar modest performance improvements by simply reducing the amount of work we do while holding the lock, while keeping the implementation essentially the same (and therefore the behavior of the pools the same as well). We will be code reviewing my proposed changes internally and I will update this ticket once we decide if those changes can be incorporated. Thanks again for the pull request. Even though we can't merge it it was very helpful in identifying where we could reduce lock contention. | ||||||||||||||||||||||
| Comment by Robert Stam [ 04/Dec/18 ] | ||||||||||||||||||||||
|
I am able to reproduce CPU utilization going to 100%, but it doesn't actually seem to be a problem. CPU utilization jumps to 100% somewhere between 100 and 200 threads (probably depends on your hardware and test scenario) but throughput remains similar. My theory is that once lock contention reaches a certain level there is some busy waiting in the implementation of the lock statement that manifests itself as 100% CPU utilization. But nothing hangs. Here's the output of a typical test run of my test program (attached as TestCSharp243.zip) run against the current driver implementation:
Note that two threads is twice as fast as one thread, and that eight threads is very close to eight times as fast as one thread. Total elapsed time continues to drop with more threads with the best performance occurring around 50 threads. But even with all the overhead of lock contention and thread context switching 10000 threads is still much faster than a single thread. And it definitely isn't hanging. That being said, I agree that we should refactor the code to do less work while holding the lock, which will reduce lock contention.
| ||||||||||||||||||||||
| Comment by Valentin Abalmasov [ 02/Dec/18 ] | ||||||||||||||||||||||
|
We've managed to fix the issue by ourselves. Please look at pull request | ||||||||||||||||||||||
| Comment by Valentin Abalmasov [ 29/Nov/18 ] | ||||||||||||||||||||||
|
It seems that the lock(this._lock) is causing that issue when running under 100+ threads environment. | ||||||||||||||||||||||
| Comment by Valentin Abalmasov [ 29/Nov/18 ] | ||||||||||||||||||||||
|
Hi there. We've successfully migrated to the 2.7.0 driver version (NOT LEGACY), because we've thought that this can be the issue in Legacy driver. Unfortunately, we've faced the same situation. Please, look at attached screenshot.
BTW, do you have any news regarding that issue. It is very critical for us ( |