[JAVA-2778] Implicit sessions are sometimes leaked Created: 06/Feb/18  Updated: 08/Jan/24  Resolved: 12/Feb/18

Status: Closed
Project: Java Driver
Component/s: Query Operations
Affects Version/s: 3.6.0, 3.6.1, 3.6.2
Fix Version/s: 3.6.3

Type: Bug Priority: Critical - P2
Reporter: Stephen Machnowski Assignee: Jeffrey Yemin
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File cpu.png     PNG File serverStatus.png    
Issue Links:
Depends
is depended on by DRIVERS-453 Decrease likelihood of implicit sessi... Closed
Related
is related to SERVER-33158 Logical Session refresh batches are t... Closed
Case:

 Description   

ISSUE DESCRIPTION
The 3.6 driver added support for implicit sessions as well as a session pool.

This issue tracks several bugs with implicit session management that cause sessions to be leaked on some query operations.

IMPACT
Applications that are exposed to this leak may first notice it by observing SERVER-33158. Messages like the ones below will appear in the server logs:

2018-02-06T13:18:44.855+0000 E -        [thread70181] Assertion: 10334:BSONObj size: 19492699 (0x1296F5B) is invalid. Size must be between 0 and 16793600(16MB) First element: 0: { q: { _id: { id: UUID("5447911f-6c78-4238-aeae-c249bec5df73"), uid: BinData(0, F5748B71272004FF160A969FEE42869CBD48FD155B00AFD0EF604F431FFB2F4F) } }, u: { $currentDate: { lastUse: true }, $setOnInsert: { user: { name: "vocab@admin" } } }, upsert: true } src/mongo/bson/bsonobj.cpp 101
2018-02-06T13:18:44.925+0000 I CONTROL  [thread70181] Failed to refresh session cache: Location10334: BSONObj size: 19492699 (0x1296F5B) is invalid. Size must be between 0 and 16793600(16MB) First element: 0: { q: { _id: { id: UUID("5447911f-6c78-4238-aeae-c249bec5df73"), uid: BinData(0, F5748B71272004FF160A969FEE42869CBD48FD155B00AFD0EF604F431FFB2F4F) } }, u: { $currentDate: { lastUse: true }, $setOnInsert: { user: { name: "vocab@admin" } } }, upsert: true }

EXAMPLES
A couple of examples which demonstrate the leaks can be found below.

A MongoCursor that is not closed will leaks an implicit session

        MongoCursor<Document> iterator = collection.find().iterator();
        while (iterator.hasNext()) {  // cursor is not closed
            iterator.next();
        }

Code which uses a cursor and properly closes it are not affected by the leak:

        try (MongoCursor<Document> iterator = collection.find().iterator()) {
            while (iterator.hasNext()) {
                iterator.next();
            }
        }
        // or
        collection.find().forEach((Block<Document>) document -> {
            // do something ...
        });
        // or
        ArrayList<Document> documents = collection.find().into(new ArrayList<>());

Any use of the first() method (or findOne() in the DBCollection API):

        collection.find().first();    // session is not properly released to the pool

The only workaround for this manifestation of the leak is to replace with explicit use of a MongoCursor, as above.

Original Description

We recently upgraded our production replica set from MongoDB 3.4.11 to 3.6.2. The following error occurs every couple minutes and is output in the Mongo log on the primary. We have also found system.sessions is empty using $listSessions on the config db and an authorized user. On the other hand $listLocalSessions returns many in-memory sessions.

2018-02-06T13:18:44.855+0000 E -        [thread70181] Assertion: 10334:BSONObj size: 19492699 (0x1296F5B) is invalid. Size must be between 0 and 16793600(16MB) First element: 0: { q: { _id: { id: UUID("5447911f-6c78-4238-aeae-c249bec5df73"), uid: BinData(0, F5748B71272004FF160A969FEE42869CBD48FD155B00AFD0EF604F431FFB2F4F) } }, u: { $currentDate: { lastUse: true }, $setOnInsert: { user: { name: "vocab@admin" } } }, upsert: true } src/mongo/bson/bsonobj.cpp 101
2018-02-06T13:18:44.925+0000 I CONTROL  [thread70181] Failed to refresh session cache: Location10334: BSONObj size: 19492699 (0x1296F5B) is invalid. Size must be between 0 and 16793600(16MB) First element: 0: { q: { _id: { id: UUID("5447911f-6c78-4238-aeae-c249bec5df73"), uid: BinData(0, F5748B71272004FF160A969FEE42869CBD48FD155B00AFD0EF604F431FFB2F4F) } }, u: { $currentDate: { lastUse: true }, $setOnInsert: { user: { name: "vocab@admin" } } }, upsert: true }



 Comments   
Comment by Jeffrey Yemin [ 13/Feb/18 ]

smachnowski@vocabulary.com: fixes have been included in 3.6.3, released today. Please let us know if you continue to experience this issue after upgrading.

Comment by Githook User [ 12/Feb/18 ]

Author:

{'email': 'jeff.yemin@10gen.com', 'name': 'Jeff Yemin', 'username': 'jyemin'}

Message: JAVA-2778: Add Javadoc for cursors indicating that they should be closed using, for example, try-with-resources.
Branch: master
https://github.com/mongodb/mongo-java-driver/commit/31a09210f8f02c6fe2f7e1b47771582f562694a4

Comment by Githook User [ 12/Feb/18 ]

Author:

{'email': 'jeff.yemin@10gen.com', 'name': 'Jeff Yemin', 'username': 'jyemin'}

Message: JAVA-2778: Test that sessions and bindings have been properly released after every functional test in driver
Branch: master
https://github.com/mongodb/mongo-java-driver/commit/456de3c8846d6113e0fe865f74c254d2aa2d2050

Comment by Githook User [ 12/Feb/18 ]

Author:

{'email': 'jeff.yemin@10gen.com', 'name': 'Jeff Yemin', 'username': 'jyemin'}

Message: JAVA-2778: Test that sessions and bindings have been properly released after every functional test in driver-async
Branch: master
https://github.com/mongodb/mongo-java-driver/commit/3c15a5fb2f6ada9b5e115b732cb862a88da17256

Comment by Githook User [ 12/Feb/18 ]

Author:

{'email': 'jeff.yemin@10gen.com', 'name': 'Jeff Yemin', 'username': 'jyemin'}

Message: JAVA-2778: Test that sessions and bindings have been properly released after every functional test in driver-core
Branch: master
https://github.com/mongodb/mongo-java-driver/commit/c2656babc711331758a4435bc1c1fc2f7944f630

Comment by Githook User [ 12/Feb/18 ]

Author:

{'email': 'jeff.yemin@10gen.com', 'name': 'Jeff Yemin', 'username': 'jyemin'}

Message: JAVA-2778: Release connection source after asynchronous write retry
Branch: master
https://github.com/mongodb/mongo-java-driver/commit/97776a9a0d5e557bc74b07b8de7729f4846623e0

Comment by Githook User [ 12/Feb/18 ]

Author:

{'email': 'jeff.yemin@10gen.com', 'name': 'Jeff Yemin', 'username': 'jyemin'}

Message: JAVA-2778: Release binding after retrying a change stream operation
Branch: master
https://github.com/mongodb/mongo-java-driver/commit/c153818e4f0c34763ac6832007569fd826b35653

Comment by Githook User [ 12/Feb/18 ]

Author:

{'email': 'jeff.yemin@10gen.com', 'name': 'Jeff Yemin', 'username': 'jyemin'}

Message: JAVA-2778: Release AsyncConnectionSource in ListDatabasesOperation
Branch: master
https://github.com/mongodb/mongo-java-driver/commit/a2774014b6d23a0440edf8b669f378a48d71d715

Comment by Githook User [ 12/Feb/18 ]

Author:

{'email': 'jeff.yemin@10gen.com', 'name': 'Jeff Yemin', 'username': 'jyemin'}

Message: JAVA-2778: Release connection source for async cursor as soon as the last batch has been retrieved
Branch: master
https://github.com/mongodb/mongo-java-driver/commit/7f36209e7625f2ea15dd3688459480c319750db8

Comment by Githook User [ 12/Feb/18 ]

Author:

{'email': 'jeff.yemin@10gen.com', 'name': 'Jeff Yemin', 'username': 'jyemin'}

Message: JAVA-2778: Release connection source for cursor as soon as the last batch has been retrieved
Branch: master
https://github.com/mongodb/mongo-java-driver/commit/0b40e6cb983d8e547ea541ede2ac4ac262590d94

Comment by Githook User [ 12/Feb/18 ]

Author:

{'email': 'jeff.yemin@10gen.com', 'name': 'Jeff Yemin', 'username': 'jyemin'}

Message: JAVA-2778: Add Javadoc for cursors indicating that they should be closed using, for example, try-with-resources.
Branch: 3.6.x
https://github.com/mongodb/mongo-java-driver/commit/6076d61443182331fbc4edee16dbc3eb4ec36b3f

Comment by Githook User [ 12/Feb/18 ]

Author:

{'email': 'jeff.yemin@10gen.com', 'name': 'Jeff Yemin', 'username': 'jyemin'}

Message: JAVA-2778: Test that sessions and bindings have been properly released after every functional test in driver
Branch: 3.6.x
https://github.com/mongodb/mongo-java-driver/commit/4976da51405e57e0c361ceb729f552efacc59a07

Comment by Githook User [ 12/Feb/18 ]

Author:

{'email': 'jeff.yemin@10gen.com', 'name': 'Jeff Yemin', 'username': 'jyemin'}

Message: JAVA-2778: Test that sessions and bindings have been properly released after every functional test in driver-async
Branch: 3.6.x
https://github.com/mongodb/mongo-java-driver/commit/0e90dd44f3d165170605949a9dd0888963a59ee4

Comment by Githook User [ 12/Feb/18 ]

Author:

{'email': 'jeff.yemin@10gen.com', 'name': 'Jeff Yemin', 'username': 'jyemin'}

Message: JAVA-2778: Test that sessions and bindings have been properly released after every functional test in driver-core
Branch: 3.6.x
https://github.com/mongodb/mongo-java-driver/commit/cd7b1e18bb9648bb920c79ce743e18d282ea0de7

Comment by Githook User [ 12/Feb/18 ]

Author:

{'email': 'jeff.yemin@10gen.com', 'name': 'Jeff Yemin', 'username': 'jyemin'}

Message: JAVA-2778: Release connection source after asynchronous write retry
Branch: 3.6.x
https://github.com/mongodb/mongo-java-driver/commit/0a6781cb652236b5eda8f81e4e34bceafe6c0809

Comment by Githook User [ 12/Feb/18 ]

Author:

{'email': 'jeff.yemin@10gen.com', 'name': 'Jeff Yemin', 'username': 'jyemin'}

Message: JAVA-2778: Release binding after retrying a change stream operation
Branch: 3.6.x
https://github.com/mongodb/mongo-java-driver/commit/c5035a9fb7ed840a9789b758edcb97e514e05d39

Comment by Githook User [ 12/Feb/18 ]

Author:

{'email': 'jeff.yemin@10gen.com', 'name': 'Jeff Yemin', 'username': 'jyemin'}

Message: JAVA-2778: Release AsyncConnectionSource in ListDatabasesOperation
Branch: 3.6.x
https://github.com/mongodb/mongo-java-driver/commit/ff5db7411982699877f2452ecdb523fce8a6e419

Comment by Githook User [ 12/Feb/18 ]

Author:

{'email': 'jeff.yemin@10gen.com', 'name': 'Jeff Yemin', 'username': 'jyemin'}

Message: JAVA-2778: Release connection source for async cursor as soon as the last batch has been retrieved
Branch: 3.6.x
https://github.com/mongodb/mongo-java-driver/commit/bc4894972185d35fc7b9f18c7a271a505c396dac

Comment by Githook User [ 12/Feb/18 ]

Author:

{'email': 'jeff.yemin@10gen.com', 'name': 'Jeff Yemin', 'username': 'jyemin'}

Message: JAVA-2778: Release connection source for cursor as soon as the last batch has been retrieved
Branch: 3.6.x
https://github.com/mongodb/mongo-java-driver/commit/fc6149e10cd6ca3a08cae057cc5959c27b2a3387

Comment by Jeffrey Yemin [ 07/Feb/18 ]

Thanks for the feedback. It's true that DBCursor still has a finalize method that will kill an open cursor on the server if the close method hasn't already been called and the cursor hasn't already been exhausted. But finalizers in Java are problematic for a number of reasons, many of which are discussed in "Effective Java, Item 7: Avoid finalizers" (here's one link to the text) and now also in the Java 9 Javadoc for the finalize method. The finalize method has also been deprecated in Java 9.

I looked through our current documentation and can't find any remaining recommendation to not explicitly close cursors. If you're aware of any that we've missed, I'd appreciate it if you point it out.

Going forward: to immediately reduce the session count on the server, I recommend that you downgrade back to the 3.4 Java driver while we work on a fix for a 3.6.3 patch release to address the issue.

Comment by Marc Tinkler [ 07/Feb/18 ]

I’d also like to add that not explicitly closing cursors in Java used to be recommended, for performance reasons, since they would be closed in a separate thread by the Java finalized.

Comment by Mira Carey [ 07/Feb/18 ]

smachnowski@vocabulary.com, you're absolutely correct that there is also a Server issue at play here. I've linked SERVER-33158 to track that work, and we're prioritizing getting a fix out quickly.

Comment by Stephen Machnowski [ 07/Feb/18 ]

Yes, we do have hundreds of uses of com.mongodb.DBCollection#findOne() as well as other DBCollection methods that delegate to the same underlying code.

I think the issue may not be just the driver though. No matter how many active server sessions there are, the mongodb server should be able to issue a refresh (persist) them to the system.sessions collection. The original error message appears to be a fully server-side error in it's session management.

2018-02-06T13:18:44.855+0000 E -        [thread70181] Assertion: 10334:BSONObj size: 19492699 (0x1296F5B) is invalid. Size must be between 0 and 16793600(16MB) First element: 0: { q: { _id: { id: UUID("5447911f-6c78-4238-aeae-c249bec5df73"), uid: BinData(0, F5748B71272004FF160A969FEE42869CBD48FD155B00AFD0EF604F431FFB2F4F) } }, u: { $currentDate: { lastUse: true }, $setOnInsert: { user: { name: "vocab@admin" } } }, upsert: true } src/mongo/bson/bsonobj.cpp 101
2018-02-06T13:18:44.925+0000 I CONTROL  [thread70181] Failed to refresh session cache: Location10334: BSONObj size: 19492699 (0x1296F5B) is invalid. Size must be between 0 and 16793600(16MB) First element: 0: { q: { _id: { id: UUID("5447911f-6c78-4238-aeae-c249bec5df73"), uid: BinData(0, F5748B71272004FF160A969FEE42869CBD48FD155B00AFD0EF604F431FFB2F4F) } }, u: { $currentDate: { lastUse: true }, $setOnInsert: { user: { name: "vocab@admin" } } }, upsert: true }

To provide more information, I'd like to note we have reduced the following server parameter to 1 minute (logicalSessionRefreshMinutes). With the current rate local in-memory sessions ($listLocalSessions) are accumulating on the server side, the server is able to "keep up" and persist them, meaning no BSON size assertion errors are thrown because the BSON update object fits into 16MB.

logicalSessionRefreshMinutes: 1    (default was 5 minutes)

I am adding screen shots of the graph from Mongo Cloud Manager displaying how every 5 minutes, processing this massive 30+ million local session cache containing orphaned/expired sessions affected performance.

Comment by Jeffrey Yemin [ 06/Feb/18 ]

As this now appears to be an issue with the Java driver I've moved it to the JAVA project.

Comment by Jeffrey Yemin [ 06/Feb/18 ]

Hi smachnowski@vocabulary.com,

We found a session leak in the Java driver's implementation of MongoIterable#first in com.mongodb.FindIterableImpl, which would typically be encountered via a call like:

MongoCollection<Document> collection = ...
collection.find().first();

A similar leak exists in com.mongodb.DBCollection#findOne().

Can you check if your application calls either of those methods? If not, the other likely cause is a situation where either a com.mongodb.DBCursor or a com.mongodb.client.MongoCursor is not closed by the application, e.g.

MongoCursor<Document> cursor = collection.find().iterator();
while (cursor.hasNext()) {
    System.out.println(cursor.next());
}
// cursor not closed!

Please also audit your application for this pattern.

Comment by Stephen Machnowski [ 06/Feb/18 ]

MongoDB servers are running on Amazon Linux (i3.4xlarge):
4.9.58-18.55.amzn1.x86_64 #1 SMP Thu Nov 2 04:38:47 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Driver is Mongo Java Driver 3.6 from:
http://mongodb.github.io/mongo-java-driver/3.6/driver/

We noticed the error because we are troubleshooting intermittent, sudden spikes of bad performance. This error message preceded the bad performance in the cases we looked at.
We receive an alert that the number of connections to the primary has suddenly grown up to 5 times it's normal average count, and most queries and updates log slow processing times. This usually clears up after a few minutes of bad performance.

Comment by Daniel Pasette (Inactive) [ 06/Feb/18 ]

Hi Stephen. Can you tell use which driver, version and platform you're running on?
Also, other than the error messages you're seeing, what is the observed impact on your cluster?

Comment by Stephen Machnowski [ 06/Feb/18 ]

This is a non-sharded replica set, so there is no config server.

Comment by Ramon Fernandez Marina [ 06/Feb/18 ]

smachnowski@vocabulary.com, can you please upload the logs for the config server primary? I think that's the first step in tracking this down.

Comment by Stephen Machnowski [ 06/Feb/18 ]

And this is a later result from serverStatus() to illustrate the pattern. activeSessionsCount is 4561421, but lastSessionsCollectionJobEntriesRefreshed = 0.

{
        "activeSessionsCount" : 4561421,
        "sessionsCollectionJobCount" : 1719,
        "lastSessionsCollectionJobDurationMillis" : 4900,
        "lastSessionsCollectionJobTimestamp" : ISODate("2018-02-06T14:43:44.327Z"),
        "lastSessionsCollectionJobEntriesRefreshed" : 0,
        "lastSessionsCollectionJobEntriesEnded" : 0,
        "lastSessionsCollectionJobCursorsClosed" : 0,
        "transactionReaperJobCount" : 4023,
        "lastTransactionReaperJobDurationMillis" : 20,
        "lastTransactionReaperJobTimestamp" : ISODate("2018-02-06T14:43:49.227Z"),
        "lastTransactionReaperJobEntriesCleanedUp" : 212
}

Comment by Stephen Machnowski [ 06/Feb/18 ]

From db.serverStatus(), displays we have 2.46M active sessions. We have not called the new Mongo.startSession() feature yet from any client code. The status document below shows the lastSessionsCollectionJobTimestamp was just shortly after we started the server, but the session refresh hasn't completed since then.

        "logicalSessionRecordCache" : {
                "activeSessionsCount" : 2464282,
                "sessionsCollectionJobCount" : 1711,
                "lastSessionsCollectionJobDurationMillis" : 2616,
                "lastSessionsCollectionJobTimestamp" : ISODate("2018-02-06T14:03:44.319Z"),
                "lastSessionsCollectionJobEntriesRefreshed" : 0,
                "lastSessionsCollectionJobEntriesEnded" : 0,
                "lastSessionsCollectionJobCursorsClosed" : 0,
                "transactionReaperJobCount" : 4015,
                "lastTransactionReaperJobDurationMillis" : 10,
                "lastTransactionReaperJobTimestamp" : ISODate("2018-02-06T14:03:46.935Z"),
                "lastTransactionReaperJobEntriesCleanedUp" : 149
        },

Generated at Thu Feb 08 08:58:03 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.