[SERVER-38342] Cache hits 95% and performance degrades (like SERVER-26055) Created: 30/Nov/18 Updated: 29/Jan/19 Resolved: 03/Dec/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | WiredTiger |
| Affects Version/s: | 3.4.18 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Paul Ridgway | Assignee: | Kelsey Schubert |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Operating System: | ALL |
| Participants: |
| Description |
|
Recently we have found that the primary is hitting 95% cache usage and query responses become very slow. Nothing material is known to have changed (ie data/size/workload etc).
I have diagnostics saved from the time of the event and can provide other details as requested.
Following a step down the cache recovered to 80% within a few minutes. |
| Comments |
| Comment by Kelsey Schubert [ 05/Dec/18 ] |
|
Hi paul.ridgway, I'd recommend checking out our docs on profiling, currentOp, and looking at the logs and possibly increasing the verbosity of the logs. Please note that the SERVER project is for reporting bugs or feature suggestions for the MongoDB server. For MongoDB-related support discussion please post on the mongodb-user group or Stack Overflow with the mongodb tag. A question like this involving more discussion would be best posted on the mongodb-users group. See also our Technical Support page for additional support resources. Kind regards, |
| Comment by Paul Ridgway [ 05/Dec/18 ] |
|
Thanks, will take a look. We are struggling to identify the sort of queries that are running during the time - any advice you can give there?
|
| Comment by Kelsey Schubert [ 03/Dec/18 ] |
|
Hi paul.ridgway, I'm seeing a very large change in workload, which I suspect is what caused resulting in the cache pressure and subsequent slowdown:
My advice would to investigate the application layer to determine whether this significant change in workload is expected, and if it is whether is can be smoothed. Kind regards, |
| Comment by Paul Ridgway [ 02/Dec/18 ] |
|
My bad, I made the same mistake before of removing the @ in the curl command.
All 3 files should be there now. |
| Comment by Kelsey Schubert [ 01/Dec/18 ] |
|
Hi paul.ridgway, Would you please try uploading these files again? I'm not seeing them in the portal. If you'd like, you could alternatively attach them to this ticket if there is no personal information contained within the logs. Thanks, |
| Comment by Paul Ridgway [ 01/Dec/18 ] |
|
There is a zip from the incident and one from now which may include the recovery. Log is as of now only |
| Comment by Paul Ridgway [ 01/Dec/18 ] |
|
Diags and logs uploaded (I think!) - the incident was at 19:24 utc/7:24pm UK time on 30 nov. |
| Comment by Kelsey Schubert [ 30/Nov/18 ] |
|
Hi paul.ridgway, Thanks for the report. So we can continue to investigate, would you please provide the logs and diagnostic.data from the affected node? I've created a secure upload portal for you to use. Files uploaded to this portal are only visible to MongoDB employees investigating this issue and are routinely deleted after some time. Kind regards, |