[JAVA-3647] Race condition between closing a connection and reusing it Created: 06/Mar/20 Updated: 28/Oct/23 Resolved: 13/Mar/20 |
|
| Status: | Closed |
| Project: | Java Driver |
| Component/s: | Reactive Streams |
| Affects Version/s: | 4.0.0 |
| Fix Version/s: | 4.0.1, 3.12.3 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Boris Petrov | Assignee: | Ross Lawley |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
MongoDB server 4.2.3 running on Linux. |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
I'm not sure this is a problem that started from version 4 of the driver. Here goes. Up to now I was using `org.mongodb:mongodb-driver-reactivestreams:1.13.0`. This was my code:
As you see I was using the deprecated `maxWaitQueueSize` because otherwise my code was getting exceptions because of "Too many operations are already waiting for a connection". Now, with that method gone in version 4, running my code also led to millions of exceptions but this time due to a different problem. What I am getting every time is this:
I enabled logging for the driver and followed this "localValue:21" to see what happens with it. This is the chain of commands I see:
As you see, closing of the connection is initiated but then the connection is checked out to be used again. I guess that is what leads to this issue. It happens every single time. What I did to "fix" it was:
Setting the connections to not expire seems to fix the problem (which acknowledges what I'm saying about a race-condition somewhere).
|
| Comments |
| Comment by Githook User [ 16/Mar/20 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Author: {'username': 'rozza', 'name': 'Ross Lawley', 'email': 'ross.lawley@gmail.com'}Message: Ensure getting the description doesn't require the pool to be open.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Boris Petrov [ 13/Mar/20 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Great, glad to have helped! I'll be sure to try out the new version and will let you know if there are more issues. Thanks for the support! | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ross Lawley [ 13/Mar/20 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi alien, I believe with your help we have identified the source of the bug. Many thanks for your patience and quick response to questions with regards to this issue. A fix will be released in the 4.0.1 version of the driver. All the best, Ross | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 13/Mar/20 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Author: {'username': 'rozza', 'name': 'Ross Lawley', 'email': 'ross.lawley@gmail.com'}Message: Ensure getting the description doesn't require the pool to be open.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 13/Mar/20 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Author: {'username': 'rozza', 'name': 'Ross Lawley', 'email': 'ross.lawley@gmail.com'}Message: Ensure getting the description doesn't require the pool to be open.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Boris Petrov [ 11/Mar/20 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Ah, sorry, I didn't get that you meant the MongoDB server. Well, as far as I can see from the logs, nothing too interesting happens:
Then a bunch of:
Some updates, inserts, etc. I tried searching for "err" but nothing appears. Note that I'm using Java 13. It shouldn't matter but you never know. Keep me updated with your try. If you can't reproduce it, let me know and we'll see how to proceed. Thank you too! | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 11/Mar/20 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi alien My request is for server logs, not application logs. While there may not be anything special about your application, it's possible that your MongoDB deployment is different in a way that would make it difficult to reproduce. We often ask for reproducers even if it seems like there is nothing special about the application. For example, I don't know how many concurrent requests the application is executing. That said, I do plan to try building my own reproducer. Thanks for your continued patience. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Boris Petrov [ 11/Mar/20 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I've sent you the full logs - if there were any application exceptions they would have been in it. So there is nothing more I can give you as information unfortunately. As for the reproduction - I haven't tried. But I don't think our application is special in any way. Have you tried running a simple app that just reads and writes a million times a second to a Mongo collection for a few minutes? I'm pretty sure it will break in the same way. If you're having troubles reproducing the problem even with such an example app, I could try doing one, but that will be in a few days at least. Thanks! | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 11/Mar/20 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi alien Thank you for the traces. I see that the first error is java.nio.channels.AsynchronousCloseException at Mar 11 07:08:02 UTC 2020. Can you locate any related messages in the server logs that correspond to this exception? Also, do you have a reproducer that you could either attach or describe? Or can you only repro in an actual application that you're running? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Boris Petrov [ 11/Mar/20 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I've attached 2 traces. There's so many and different errors happening that I'm not sure this will help you. Just to let you know that we've downgraded to 1.13.0 due to these problems. Thanks for the support! | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 10/Mar/20 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Would it be possible to re-run your tests with only debug logging enabled, and not trace? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Boris Petrov [ 08/Mar/20 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
We got this exception even with the "fix" I specified above:
Although it appeared after a number of runs of our tests so it's very rare. I guess this is again due to some race-condition somewhere. Not sure if it is related to the original issue. P.S. I saw a number of those in the traces in the original issue so it's probably related. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Boris Petrov [ 07/Mar/20 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks for the support! I've attached a trace file. Be sure to `less -R` it (after extracting, of course). I'm not sure that these will help you pinpoint the problem but I hope they can convince you something is amiss. Please tell me if I can do something more! | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 06/Mar/20 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks for reporting this. With the information you've provided, I can't fully reconstruct the chain of events. I gather you don't see any messages like:
? What other log messages (for other connections presumably) do you see between these two:
If you could provide the full logs for the time between the first and last log messages you provided (including all connection activity), that would be helpful. |