[SERVER-78923] logicalSessionRecordCache.activeSessionsCount is not getting flushed and not able to open the config.system.sessions collection and over period of time sessions are getting filled and Which leading to QR node crash Created: 13/Jul/23  Updated: 08/Sep/23  Resolved: 08/Sep/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 5.0.7
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Selva ßalaji Assignee: Yuan Fang
Resolution: Done Votes: 0
Labels: Bug
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File charts.png     PNG File error.png     PNG File session.png    
Issue Links:
Duplicate
is duplicated by SERVER-78922 logicalSessionRecordCache.activeSessi... Closed
Operating System: ALL
Participants:

 Description   

We have a shard cluster with 3 QR nodes, one PSS config setup and 1 shard setup(1 primary and 4 secondary servers) with 5.0.7 version on all the nodes.
We can see the 'logicalSessionRecordCache.activeSessionsCount' is increasing frequently and not getting cleared on all the QR and the data nodes. While checking from the serverStatus, it shows the number of sessions refreshed and the number of sessions ended during the last refresh was 0. 
 

 
So we checked the 'config.system.sessions' collections and it showed an error "No chunks were found for the collection config.system.sessions" and we are getting the same error message in the log also.
 

 
We faced the connection increase in shard setup and the LogicalSessionRecordCache.activeSessions increased to the max limit and mongo became unresponsive. (limit 1000000).
 
 
 
To check the session related details we need to access the  'config.system.sessions' collection. Which is not accessible and giving the error as mentioned. 
Need the solution to access the 'config.system.sessions' collection and need the solution for why 'logicalSessionRecordCache.activeSessionsCount' is increasing frequently and not getting cleared (flushed).



 Comments   
Comment by Yuan Fang [ 08/Sep/23 ]

We haven’t heard back from you for some time, so I’m going to close this ticket. If this is still an issue for you, please provide additional information and we will reopen the ticket.

Comment by Yuan Fang [ 15/Aug/23 ]

Hi balaji@mafiree.com, we still need additional information to diagnose the problem. If this is still an issue for you, would you please provide what we've requested in this comment? Thank you. 

Comment by Yuan Fang [ 19/Jul/23 ]

Hi balaji@mafiree.com,

Thank you for your report. I understand that your clusters encounter stalls and observed zero logical session refreshes during the incident period. We need more diagnostic data to investigate this further. I've created a secure upload portal for you. Files uploaded to this portal are hosted on Box, are visible only to MongoDB employees, and are routinely deleted after some time.

For each node in the replica set spanning a time period that includes the incident, would you please archive (tar or zip) and upload to that link:

  • the mongod/mongos logs
  • the $dbpath/diagnostic.data directory (the contents are described here)

Regards,
Yuan

Generated at Thu Feb 08 06:39:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.