[JAVA-4417] Memory leak in AsynchronousTlsChannelGroup Created: 30/Nov/21  Updated: 28/Oct/23  Resolved: 06/Jan/22

Status: Closed
Project: Java Driver
Component/s: Reactive Streams
Affects Version/s: None
Fix Version/s: 4.4.1

Type: Bug Priority: Major - P3
Reporter: Dániel Imre Assignee: Valentin Kavalenka
Resolution: Fixed Votes: 1
Labels: external-user
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File heapdump.png    
Backwards Compatibility: Fully Compatible
Documentation Changes: Not Needed

 Description   

We are observing a memory leak related to com.mongodb.internal.connection.tlschannel.async.AsynchronousTlsChannelGroup

Driver version is 4.3.4 

Server is Azure CosmosDB using MongoDB Server version 4.0, server-side retries enabled, SSL enabled.

Application is using spring-boot 2.4.5 and reactive stack.

 

The leak starts after the following exception is thrown: 

{{java.nio.channels.ClosedChannelException
at java.base/java.nio.channels.spi.AbstractSelectableChannel.register(Unknown Source)
at com.mongodb.internal.connection.tlschannel.async.AsynchronousTlsChannelGroup.registerPendingSockets(AsynchronousTlsChannelGroup.java:612)
at com.mongodb.internal.connection.tlschannel.async.AsynchronousTlsChannelGroup.loop(AsynchronousTlsChannelGroup.java:414)
at java.base/java.lang.Thread.run(Unknown Source)}}

The leak is in com.mongodb.internal.connection.tlschannel.async.AsynchronousTlsChannelGroup#pendingRegistrations

Pending registrations are accumulating and eventually causing OOM.

It seems that com.mongodb.connection.TlsChannelStreamFactoryFactory.TlsChannelStream#openAsync will continue to create AsynchronousTlsChannel instances addig new pending registrations even after the com.mongodb.internal.connection.tlschannel.async.AsynchronousTlsChannelGroup#loop was shut down.

Not exactly clear what is triggering this behavior as we only observed it on one of our environments. It also needs decent time before it happens. Last time it took 2 days before it started.

 

There are other exceptions thrown prior to this one but hours or days before so not necessarily related:

java.lang.NullPointerException: Cannot invoke "com.mongodb.internal.connection.tlschannel.impl.BufferHolder.prepare()" because "this.outEncrypted" is null at com.mongodb.internal.connection.tlschannel.impl.TlsChannelImpl.wrapAndWrite(TlsChannelImpl.java:393) at com.mongodb.internal.connection.tlschannel.impl.TlsChannelImpl.write(TlsChannelImpl.java:384) at com.mongodb.internal.connection.tlschannel.ClientTlsChannel.write(ClientTlsChannel.java:184) at com.mongodb.internal.connection.tlschannel.async.AsynchronousTlsChannelGroup.writeHandlingTasks(AsynchronousTlsChannelGroup.java:540) at com.mongodb.internal.connection.tlschannel.async.AsynchronousTlsChannelGroup.doWrite(AsynchronousTlsChannelGroup.java:498) at com.mongodb.internal.connection.tlschannel.async.AsynchronousTlsChannelGroup.lambda$processWrite$4(AsynchronousTlsChannelGroup.java:459) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.base/java.lang.Thread.run(Unknown Source)

 

Another one is: 
error in operation java.lang.NullPointerException

Most probably thrown from{{ com.mongodb.internal.connection.tlschannel.async.AsynchronousTlsChannelGroup#processWrite}}



 Comments   
Comment by Githook User [ 06/Jan/22 ]

Author:

{'name': 'Valentin Kovalenko', 'email': 'valentin.male.kovalenko@gmail.com', 'username': 'stIncMale'}

Message: Merge changes from tls-channel for race condition manifested when closing async sockets right after creation (#851)

This is a backport of https://github.com/mongodb/mongo-java-driver/pull/848

JAVA-4417
Branch: 4.4.x
https://github.com/mongodb/mongo-java-driver/commit/eec0eed9a3586fd423c07958b0dba0d48adb6f01

Comment by Githook User [ 05/Jan/22 ]

Author:

{'name': 'Valentin Kovalenko', 'email': 'valentin.male.kovalenko@gmail.com', 'username': 'stIncMale'}

Message: Merge changes from tls-channel for race condition manifested when closing async sockets right after creation (#848)

See the bug report https://github.com/marianobarrios/tls-channel/issues/34
and the two PRs from which the changes were manually merged:

JAVA-4417
Branch: master
https://github.com/mongodb/mongo-java-driver/commit/9ecb895f4aac814b2166b854811e8321bd99aa66

Comment by Valentin Kavalenka [ 08/Dec/21 ]

Hi d4niel.imre@gmail.com,

Just giving you a quick update. The problem stems from the tls-channel library that we shade in the driver. I reported the issue in detail and proposed a solution. Will see what the author thinks about it.

Comment by Jeffrey Yemin [ 01/Dec/21 ]

Hi d4niel.imre@gmail.com,

Thanks for letting us know. We will look into it and get back to you with any further questions.

Generated at Thu Feb 08 09:02:01 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.