[JAVA-4642] Replace synchronized block with ReentrantLock in BaseCluster Created: 11/Jun/22 Updated: 28/Oct/23 Resolved: 27/Jul/22 |
|
| Status: | Closed |
| Project: | Java Driver |
| Component/s: | Internal |
| Affects Version/s: | None |
| Fix Version/s: | 4.8.0 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Jeffrey Yemin | Assignee: | Jeffrey Yemin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | loom | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Epic Link: | Virtual Threads Support |
| Description |
|
Testing JEP 425/428 using a JDK 19 early access build (build 26), I am able to create a scenario where all virtual threads are starved of carrier threads due to the fact that all available carrier threads are pinned by synchronized blocks. Multiple virtual threads are observed (via jcmd) to be blocked waiting to enter the synchronized block in BaseCluster#withLock. Meanwhile, one thread is observed to be in that same block and, within that block, waiting to acquire the ReentrantLock in ConcurrentPool#lockUnfair. Another thread is also trying to acquire that same lock. No threads appear to actually hold that lock. So what I think is happening is that neither of those two threads waiting to acquire the ReentrantLock are able to do so because all available carrier threads are pinned to other virtual threads waiting to enter the synchronized block in BaseCluster#withLock, and in particular the one that holds the BaseCluster lock is unable to unwind the stack in order to release it. We can avoid this scenario by replacing the synchronized block in BaseCluster#withLock with a ReentrantLock. Note: this scenario seems to contradict this part of JEP 425:
Specifically this part:
as this lock is guarding in-memory operations. JEP 425 does state that:
But it's not clear if that improvement will come before or after JEP 425 becomes a non-preview feature. And even so, we would like the driver to work well with JEP 425/428 even while they are still in preview. |
| Comments |
| Comment by Githook User [ 27/Jul/22 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Author: {'name': 'Jeff Yemin', 'email': 'jeff.yemin@mongodb.com', 'username': 'jyemin'}Message: Replace synchronized with ReentrantLock (#984)
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Valentin Kavalenka [ 23/Jun/22 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
We got a confirmation that the behavior reported in this ticket is intended. This means we should just replace all synchronized methods and blocks with Java SE API locks (see | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Valentin Kavalenka [ 23/Jun/22 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Asked a question in the loom-dev email list, waiting for response here: https://mail.openjdk.org/pipermail/loom-dev/2022-June/004750.html | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Valentin Kavalenka [ 21/Jun/22 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I agree: trying to enter a synchronized block pins the virtual thread to its carrier. Since this arguably contradicts JEP 425, I created a reproducer that is not specific to the driver. Here is how it works:
If we run the application, we see
In the thread dump
we see that both carrier threads are executing a task:
both taskA are being executed (one is pinned by pinWithInfinitLoop, the other one is unfortunately pinned while trying to enter the synchronized block in withSynchronized)), and taskB ("tid": "23") has not even started because there are no carriers left available:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 11/Jun/22 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Note: I can't reproduce this on the master branch, but I can with 4.6.x branch. My hypothesis is that the changes to the buffer and session pools hide the issue somehow. |