[SERVER-38342] Cache hits 95% and performance degrades (like SERVER-26055) Created: 30/Nov/18  Updated: 29/Jan/19  Resolved: 03/Dec/18

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.4.18
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Paul Ridgway Assignee: Kelsey Schubert
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File workload.png    
Operating System: ALL
Participants:

 Description   

Recently we have found that the primary is hitting 95% cache usage and query responses become very slow.

Nothing material is known to have changed (ie data/size/workload etc).

 

I have diagnostics saved from the time of the event and can provide other details as requested.

 

Following a step down the cache recovered to 80% within a few minutes.



 Comments   
Comment by Kelsey Schubert [ 05/Dec/18 ]

Hi paul.ridgway,

I'd recommend checking out our docs on profiling, currentOp, and looking at the logs and possibly increasing the verbosity of the logs. Please note that the SERVER project is for reporting bugs or feature suggestions for the MongoDB server. For MongoDB-related support discussion please post on the mongodb-user group or Stack Overflow with the mongodb tag. A question like this involving more discussion would be best posted on the mongodb-users group.

See also our Technical Support page for additional support resources.

Kind regards,
Kelsey

Comment by Paul Ridgway [ 05/Dec/18 ]

Thanks, will take a look. We are struggling to identify the sort of queries that are running during the time - any advice you can give there?

 

Comment by Kelsey Schubert [ 03/Dec/18 ]

Hi paul.ridgway,

I'm seeing a very large change in workload, which I suspect is what caused resulting in the cache pressure and subsequent slowdown:

My advice would to investigate the application layer to determine whether this significant change in workload is expected, and if it is whether is can be smoothed.

Kind regards,
Kelsey

Comment by Paul Ridgway [ 02/Dec/18 ]

My bad, I made the same mistake before of removing the @ in the curl command.

 

All 3 files should be there now.

Comment by Kelsey Schubert [ 01/Dec/18 ]

Hi paul.ridgway,

Would you please try uploading these files again? I'm not seeing them in the portal. If you'd like, you could alternatively attach them to this ticket if there is no personal information contained within the logs.

Thanks,
Kelsey

Comment by Paul Ridgway [ 01/Dec/18 ]

There is a zip from the incident and one from now which may include the recovery. Log is as of now only

Comment by Paul Ridgway [ 01/Dec/18 ]

Diags and logs uploaded (I think!) - the incident was at 19:24 utc/7:24pm UK time on 30 nov.

Comment by Kelsey Schubert [ 30/Nov/18 ]

Hi paul.ridgway,

Thanks for the report. So we can continue to investigate, would you please provide the logs and diagnostic.data from the affected node?

I've created a secure upload portal for you to use. Files uploaded to this portal are only visible to MongoDB employees investigating this issue and are routinely deleted after some time.

Kind regards,
Kelsey

Generated at Thu Feb 08 04:48:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.