[SERVER-51512] cache evict problem Created: 13/Oct/20  Updated: 25/Oct/20  Resolved: 25/Oct/20

Status: Closed
Project: Core Server
Component/s: Concurrency, Performance, Stability
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: vinllen chen Assignee: Dmitry Agranat
Resolution: Duplicate Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Screen Shot 2020-10-13 at 11.28.18 AM.png     PNG File Screen Shot 2020-10-13 at 11.29.09 AM.png     PNG File Screen Shot 2020-10-13 at 11.29.32 AM.png     PNG File Screen Shot 2020-10-13 at 11.30.52 AM-1.png     PNG File Screen Shot 2020-10-13 at 11.30.52 AM.png     PNG File Screen Shot 2020-10-13 at 11.31.26 AM.png     PNG File Screen Shot 2020-10-13 at 11.31.49 AM.png     PNG File Screen Shot 2020-10-13 at 11.32.04 AM.png     File log+ftdc.tar.gz    
Issue Links:
Duplicate
is duplicated by SERVER-50365 Stuck with long-running transactions ... Closed
Operating System: ALL
Participants:

 Description   

MongoDB version: 4.2, replica set
Problem: primary node cache evict not work.

The user's write operations were blocked because of the wiredTiger write ticket runs out. At 2020-10-07 01:24, the user inserts traffic increase so the CPU usage, dirty cache, and some metrics increase(the dirty cache percentage > 20%). And some slow log generated. However, after 01:27, there is almost no traffic, but the write ticket, CPU, dirty cache, WiredTiger hazard point check entry walked are still abnormal. It looks like the cache evict is not work, but as we look at the CPU profiling, the evict threads were still running.

This situation was resolved at about 10-07 18:00 when the replica set restarted.

We also attach the log which is not the raw log but collected by our log gathering platform because the raw log is rotated.

Please let me known if you guys want more information.



 Comments   
Comment by Dmitry Agranat [ 25/Oct/20 ]

cvinllen@gmail.com, we are actively discussing adding the fix to one of our future 4.2 and/or 4.0 minor releases. I suggest watching SERVER-50365 for updates.

Comment by vinllen chen [ 20/Oct/20 ]

Thanks for your reply, it looks like this patch will backport to v4.0 and v4.2, but the fix versions are not including version < v4.4. Is there any plan?
 

Comment by Dmitry Agranat [ 13/Oct/20 ]

Hi cvinllen@gmail.com,

It looks like you are hitting SERVER-50365. In addition, I believe that all the custom parameters you are using plus the very small cache size (and a heavy write workload in question) makes things even worse. Whenever you decide to upgrade to take advantage of SERVER-50365, I recommend testing with all the default parameters and a larger cache size.

Thanks,
Dima

Generated at Thu Feb 08 05:25:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.