[SERVER-27262] Wiredtiger cache usage is higher than normal status, so eviction thread never sleep Created: 02/Dec/16  Updated: 08/Feb/23  Resolved: 06/Jan/17

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.2.9
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: 아나 하리 Assignee: Kelsey Schubert
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File current-op.json     File diagnostics-metrics.tar.gz     Text File iostat.txt     Text File mongostat.txt     Text File pagefault-sarB.txt     File stack.tar.gz     Text File vmstat.txt     Microsoft Word wiredtiger-cache-metrics-1min-delta.xlsx     PNG File wiredtiger-cacheusage.png    
Operating System: ALL
Participants:

 Description   

in Two shards mongodb cluster, one primary's wiredtiger cache usage is staying about 90%.
After examinging stack trace, eviction thread never sleep and consume 1 cpu core all the time.

# top
top - 14:47:31 up 87 days,  3:19,  1 user,  load average: 1.23, 1.26, 1.22
Tasks: 683 total,   1 running, 682 sleeping,   0 stopped,   0 zombie
%Cpu0  :  1.0 us,  1.0 sy,  0.0 ni, 98.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
...
%Cpu10 :  1.0 us,  0.0 sy,  0.0 ni, 99.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu11 :100.0 us,  0.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st <== 
%Cpu12 :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
...

We query a lot of data selecting query(about 27000 docs x 30 times) on both primary member.
One primary is okay, but the other is not good and still consuming 1 cpu core for evicitng pages.

Looks like cache usage is not dropped stable status (like 80~85%), so eviction thread never stop scanning pages. I don't know why cache usage is never drop to stable status.
Wiredtiger status report they read-in a lot of block to wired tiger cache (270MB/10 sec).
But weird thing is that There's no disk read and no major fault and not so many minor fault on both primary server. all system metric (except cpu) is almost same as the other primary(stable one).

According to stacktrace, one thread is doing "__tree_walk_internal()", acutally 2 threads and they are consuming 1 cpu core by turns.



 Comments   
Comment by Kelsey Schubert [ 06/Jan/17 ]

Hi matt.lee,

Since we haven't heard back from you regarding this issue since upgrading to MongoDB 3.2.11, I assume you haven't seen this issue again.

If that's the case, I'd like to close this ticket. If you see this issue again on a later version of MongoDB, please let us know so we can reopen this ticket and continue to investigate.

Kind regards,
Thomas

Comment by 아나 하리 [ 06/Dec/16 ]

Hi Bruce.

This case (eviction thread can't sleep) has gone yesterday. (We have not done anything).
I will upgrade mongodb 3.2.11, but I am not sure this case happen again or not.

Thanks.

Comment by Bruce Lucas (Inactive) [ 05/Dec/16 ]

Hi Matt,

There were also substantial improvements in cache management after 3.2.9. Would you be able to test whether the most recent 3.2 version, 3.2.11, has better eviction behavior on your workload? It also collects additional internal metrics that may help us understand the behavior on you workload better.

Thanks,
Bruce

Comment by 아나 하리 [ 05/Dec/16 ]

@Alexander Gorrod

Sorry it's typo,
Actually I suffer this case in MongoDB 3.2.9.

Could you change it to 3.2.9?

Regards,
Matt.

Comment by Alexander Gorrod [ 05/Dec/16 ]

This ticket references MongoDB 3.0.9. We have made a lot of improvements to WiredTiger cache management (eviction) since the 3.0.9 release. We recommend that you upgrade to the 3.4.0 release of MongoDB - please let us know if your issue has not been resolved in a more recent release.

Generated at Thu Feb 08 04:14:40 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.