[SERVER-27596] Wiredtiger very high CPU usage Created: 06/Jan/17 Updated: 30/Jan/17 Resolved: 30/Jan/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | WiredTiger |
| Affects Version/s: | 3.2.11 |
| Fix Version/s: | None |
| Type: | Question | Priority: | Major - P3 |
| Reporter: | Vincent van Megen | Assignee: | Kelsey Schubert |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Debian 7, xeon d1521, 32gb ram, 2x480gb ssd, 1gbit networking |
||
| Attachments: |
|
| Participants: |
| Description |
|
I'm running a 3 member replicaset. The average workload is very constant and does not push load average above 2-3. When workload increases (for example due to a delay in our processing queue) mongodb starts processing queries very fast but after about 5-10 minutes completely starts hogging userspace CPU usage (100%, load average 120-130). Even when completely shutting down the workload, load average still stays at 120-130, the only way for it to go back to normal is running rs.stepDown() and force another member to be primary. This instance then (usually) starts processing the requests very fast but sometimes still goes back to the very high load average. In the mongodb log I can see write requests are taking a very long time (some upto 10 seconds). I'm not really sure how to prevent this as it completely disables my whole workload. |
| Comments |
| Comment by Kelsey Schubert [ 30/Jan/17 ] |
|
Hi vincentvm, It appears that your workload is hitting hardware limits. The workload increase in is significant (possibly as high as 10x as many operations at its peak) and it's likely that the system's I/O was initially constrained as the workload spiked. As throughput slows, more work continues to come into applications, but cannot be cleared as quickly, causing WiredTiger cache eviction threads to hit the CPU limit. My advice would be to consider stabilizing your workload or provision a more powerful host to resolve this issue. Kind regards, |
| Comment by Vincent van Megen [ 09/Jan/17 ] |
|
Another screenshot of MongoDB Compass showing the amount of queued writes increasing. |
| Comment by Vincent van Megen [ 09/Jan/17 ] |
|
2 screenshots of cloud.mongodb.com statistics while this issue is happening. |
| Comment by Vincent van Megen [ 06/Jan/17 ] |
|
I don't have the logs anymore, i uploaded the diagnostics data directory. It's in 3 folders 0,1,2 directory 2 was the primary servers. It started happening at around 9am gmt+1 and lasted untill about 11am gmt+1 on the 6th of january. |
| Comment by Kelsey Schubert [ 06/Jan/17 ] |
|
Hi vincentvm, If you could upload the complete directory, it may provide us with additional context about your typical workload. I've created a secure portal for you to upload the diagnostic.data as well as the complete mongod logs for each node. Thanks again, |
| Comment by Vincent van Megen [ 06/Jan/17 ] |
|
I have the diagnostics.data directory, where can I upload these files? Should I just upload the diagnostics files from the day this happened? |
| Comment by Kelsey Schubert [ 06/Jan/17 ] |
|
Hi vincentvm, Thanks for reporting this behavior. So we can continue to investigate, would you please provide an archive of the diagnostic.data and complete logs for each node in the replica set? Thank you, |