[SERVER-49000] number of logical sessions is constantly increasing, including for CloudManager agent Created: 22/Jun/20  Updated: 27/Oct/23  Resolved: 16/Jul/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 3.6.17, 4.0.18
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Kay Agahd Assignee: Dmitry Agranat
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File numberOfLogicalSessions.png    
Issue Links:
Related
Operating System: ALL
Participants:

 Description   

We are running a mongodb v3.6 replSet for several months (or even years) already without any problems so far. However, recently our database client, which is using mongodb java driver v3.11 for a decent time already, got the following error:

ERROR 2020-06-07 18:31:11,705 [ttp-nio-8080-exec-13] Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is com.mongodb.MongoQueryException: Query failed
 with error code 261 and error message 'cannot add session into the cache' on server mongo-opensearch-03.db00.pro06.eu.idealo.com:27017] with root cause [DirectJDKLog.java:175]
com.mongodb.MongoQueryException: Query failed with error code 261 and error message 'cannot add session into the cache' on server mongo-opensearch-03.db00.pro06.eu.idealo.com:27017
        at com.mongodb.operation.FindOperation$1.call(FindOperation.java:735)

It turned out that the default of 1 million logical sessions had been reached. We even could not kill any sesssion:

opensearch:PRIMARY> db.runCommand( { killAllSessions: [ ] } )
{
	"ok" : 0,
	"errmsg" : "cannot add session into the cache",
	"code" : 261,
	"codeName" : "TooManyLogicalSessions"
}

We had to restart mongod in order to reset all logical sessions.

opensearch:SECONDARY> db.aggregate( [ { $listLocalSessions: { allUsers: true } },{$group:{_id:"$user",count:{$sum:1}}} ] )
{ "_id" : "admin@admin", "count" : 1 }
{ "_id" : "mms@admin", "count" : 3 }

We then upgraded from mongodb v3.6 to 4.0.18 to see if this solves the issue. It did not. After 12 days, the number of logical sessions is still constantly increasing:

opensearch:SECONDARY> db.aggregate( [ { $listLocalSessions: { allUsers: true } },{$group:{_id:"$user",count:{$sum:1}}} ] )
{ "_id" : "opensearch@admin", "count" : 115 }
{ "_id" : "admin@admin", "count" : 2388 }
{ "_id" : "mms@admin", "count" : 1370 }

I'll attach also a screenshot of our dashboard where you see that the number of logical sessions is constantly increasing.

The user "opensearch" is our usual database client. The number of logical sessions of 125 is reasonably low.
The user "mms" is mongodb's CloudManager agent. It's been updated recently to the newest version. However, the number of logical sessions of 1370 is quite high already.

Although we have been running dozens of mongodb v3.6 replSets in production for several years now, this has never happened before. Since everything is rolled out automatically, and therefore uses identical configuration, we cannot explain the error and therefore suspect a mongodb bug.



 Comments   
Comment by Kay Agahd [ 08/Sep/21 ]

@Triet Le I just compared the config with the config of another replSet which was running without having an increased number of logical sessions and stumbled about the above mentioned parameter sharding.clusterRole: shardsvr which didn't exist there.

Comment by Triet Le [ 07/Sep/21 ]

@kay.agahd how did you come up with this? 

Comment by Eliton Gadotti [ 28/Jun/21 ]

Hello guys.

I'm here only for thank you all, specially @kay.agahd for the solution. We had the same problem in our cluster,

Maybe it can be a kind of validation/error message when modifying the config file.

For example, In our case, we have replicaSet but not Sharding. So, it wouldn't be possible to set some sharding config.

 

Thank you all again

Cheers,

Eliton.

Comment by Dmitry Agranat [ 16/Jul/20 ]

Thanks kay.agahd@idealo.de for the update. I will go ahead and close this ticket.

Comment by Kay Agahd [ 15/Jul/20 ]

Hi guys,

I found the reason for the constantly increasing number of sessions. I had to remove the following parameter from the config and restart mongod:

sharding.clusterRole: shardsvr

Since then the number of sessions is almost constant, going little up and down.
I think this setting has been wrongly copied & pasted from a shard config. And because it's working for so long time already, nobody thought of misconfiguration.
Maybe mongod could print a warning if it's detecting such a misconfiguration as you do already with other detected misconfigurations.
So, you can close this ticket. Thanks for your time!

Comment by Kay Agahd [ 10/Jul/20 ]

Hi Dmitry,

no, this deployment never was part of a sharded cluster in the past.
Also, the config.system.sessions collection does not exist:

MongoDB server version: 3.6.17
opensearch:PRIMARY> use config
switched to db config
opensearch:PRIMARY> db.system.sessions.stats()
{
	"ns" : "config.system.sessions",
	"ok" : 0,
	"errmsg" : "Collection [config.system.sessions] not found."
}
opensearch:PRIMARY>

This is the number of current sessions on the primary:

opensearch:PRIMARY> db.aggregate( [ { $listLocalSessions: { allUsers: true } },{$group:{_id:"$user",count:{$sum:1}}} ] )
{ "_id" : "mms@admin", "count" : 15125 }
{ "_id" : "admin@admin", "count" : 8184 }
{ "_id" : "opensearch@admin", "count" : 264003 }

Comment by Dmitry Agranat [ 09/Jul/20 ]

Thanks kay.agahd@idealo.de, a couple of follow up questions:

  • Was this deployment a part of a sharded cluster in the past?
  • Does the config.system.sessions collection exist?
Comment by Kay Agahd [ 08/Jul/20 ]

Hi Dmitry,

here are the results from the query you asked me to execute again:

opensearch:PRIMARY> db.aggregate( [ { $listLocalSessions: { allUsers: true } },{$group:{_id:"$user",count:{$sum:1}}} ] )
{ "_id" : "mms@admin", "count" : 14970 }
{ "_id" : "admin@admin", "count" : 8115 }
{ "_id" : "opensearch@admin", "count" : 254256 }
opensearch:SECONDARY> db.aggregate( [ { $listLocalSessions: { allUsers: true } },{$group:{_id:"$user",count:{$sum:1}}} ] )
{ "_id" : "opensearch@admin", "count" : 230 }
{ "_id" : "mms@admin", "count" : 3177 }
{ "_id" : "admin@admin", "count" : 5646 }
opensearch:SECONDARY> db.aggregate( [ { $listLocalSessions: { allUsers: true } },{$group:{_id:"$user",count:{$sum:1}}} ] )
{ "_id" : "opensearch@admin", "count" : 70 }
{ "_id" : "admin@admin", "count" : 1577 }
{ "_id" : "mms@admin", "count" : 3438 }

The version of the mms agents has been at the time I wrote this ticket 10.14.6.6374 but has now be updated to the latest version (10.15.2.6472).
The link to our project is https://cloud.mongodb.com/v2/4f5f582287d1d86fa8b88186
The link to the replSet this ticket is about is: https://cloud.mongodb.com/v2/4f5f582287d1d86fa8b88186#metrics/replicaSet/5e50561fedfef105da6670ab/overview
We verified again our software and couldn't find any unclosed cursors.

Comment by Dmitry Agranat [ 08/Jul/20 ]

Hi kay.agahd@idealo.de, I am taking over this case from Carl and getting myself familiar with the issue.

It is not clear at the moment where the session leak is coming from, driver, server or CM agent.

Could you execute again this command on all 3 members and post results here? This will give us another indication (since the last time this command was executed) which user is responsible for the most session creation:

db.aggregate( [ { $listLocalSessions: { allUsers: true } },{$group:{_id:"$user",count:{$sum:1}}} ] )

Also, can you check if your application calls either of these methods? We'd like to make sure there is no a situation where either a com.mongodb.DBCursor or a com.mongodb.client.MongoCursor is not closed by the application, e.g:

MongoCursor<Document> cursor = collection.find().iterator();
while (cursor.hasNext()) {
    System.out.println(cursor.next());
}
// cursor not closed!

We know that mongod version is 4.0.18, Java driver version is 3.11, what is the CM agent version? You can post a link to your project so we can have a look.

Comment by Kay Agahd [ 07/Jul/20 ]

May I ask if you need any additional information for your investigation?

Comment by Kay Agahd [ 24/Jun/20 ]

The requested log and diagnostic files of the other two replSet members have now also been uploaded.

Comment by Carl Champain (Inactive) [ 23/Jun/20 ]

Yes, please provide the logs and diagnostic files for the other two members!

Comment by Kay Agahd [ 23/Jun/20 ]

Hi carl.champain,

I've uploaded the requested files. I can send also log and diagnostic files of the two other replSet members if you need them.
Thank you!

Comment by Carl Champain (Inactive) [ 23/Jun/20 ]

Hi kay.agahd@idealo.de,

Thank you for the report.

Would you please archive (tar or zip) the mongod.log files and the $dbpath/diagnostic.data directory (the contents are described here) covering this behavior and upload them to this support uploader location?

Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time.

Kind regards,
Carl
 

Generated at Thu Feb 08 05:18:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.