[JAVA-3753] GetMore not closing cursor after exception leading mongos to OOM Created: 02/Jun/20  Updated: 27/Oct/23  Resolved: 29/Jul/20

Status: Closed
Project: Java Driver
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Vinicius Grippa Assignee: Ross Lawley
Resolution: Gone away Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to JAVA-3774 Ensure connectionSource is released i... Closed

 Description   

If we analyze the code:

 

https://github.com/mongodb/mongo-java-driver/blob/master/driver-core/src/main/com/mongodb/internal/operation/QueryBatchCursor.java#L264

private void getMore() {
 Connection connection = connectionSource.getConnection();
 try {
 if (serverIsAtLeastVersionThreeDotTwo(connection.getDescription())) {
 try {
 initFromCommandResult(connection.command(namespace.getDatabaseName(),
 asGetMoreCommandDocument(),
 NO_OP_FIELD_NAME_VALIDATOR,
 ReadPreference.primary(),
 CommandResultDocumentCodec.create(decoder, "nextBatch"),
 connectionSource.getSessionContext()));
 } catch (MongoCommandException e) {
 throw translateCommandException(e, serverCursor);
 }
 } else {
 QueryResult<T> getMore = connection.getMore(namespace, serverCursor.getId(),
 getNumberToReturn(limit, batchSize, count), decoder);
 initFromQueryResult(getMore);
 }
 if (limitReached()) {
 killCursor(connection);
 }
 if (serverCursor == null) {
 this.connectionSource.release();
 this.connectionSource = null;
 }
 } finally {
 connection.release();
 }
 }

The killCursor function is not inside a try..catch/finally so the connection is being closed in case of an exception but the cursor is not.

In certain situations, the application can start pilling up cursors on mongos leading to OOM.



 Comments   
Comment by Ross Lawley [ 01/Jul/20 ]

Hi,

I've added JAVA-3774 for the connection source release.

If the kill Cursor command fails then there is potential for the cursors to still exist in MongoS and they will eventually be killed by the cursor timeout process. I'm not sure if there is anything the driver could do here, but I'll await to see if we can reproduce the issue here.

Ross

Comment by Vinicius Grippa [ 20/Jun/20 ]

Hi Ross,

 

Sorry for the late response. I'm trying to work on a reproducible case to give to you. At this moment I don't have one. What I could observe is that the abandoned cursors on the mongoS are being killed by the cursorTimeoutMillis setting (10 minutes).  The consequence is that I have several cursors opened for minutes and the application is creating more. 

 

Let me know if this helps otherwise I will keep working to see if I can get a reproducible case.

Comment by Ross Lawley [ 19/Jun/20 ]

Hi vgrippa@gmail.com,

Just chasing this up to see if you could provide any more information.

Kind Regards,

Ross

Comment by Ross Lawley [ 10/Jun/20 ]

Hi vgrippa@gmail.com,

Thanks for the ticket. I'm not sure I fully understand. Are you seeing errors where the killCursor errors? Do you have any logs?

I can see that if killCursor(connection) throws an error then the connectionSource isn't released do you think that is the cause of the OOM errors you are seeing?

Ross

Generated at Thu Feb 08 09:00:21 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.