[SERVER-25883] High iowait, slow queries, lots of free memory Created: 31/Aug/16 Updated: 13/Sep/16 Resolved: 01/Sep/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Performance |
| Affects Version/s: | 3.2.9 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Nic Cottrell (Personal) | Assignee: | Unassigned |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Operating System: | ALL |
| Participants: |
| Description |
|
As of a few days ago, one node is my cluster is suddenly generating many slow queries. The machine is showing very high iowait (sustained > 30%) but cpu of mongod never > 1%. The machine is showing 26GB free memory and 0 swap used. This is the only node sharing with a mysqld process, but I don't understand why mongod is not taking lots of memory like on all other nodes. |
| Comments |
| Comment by Kelsey Schubert [ 13/Sep/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi niccottrell, Thank you for the update! I'm glad you were able to identify that a failed drive was the cause of this issue. Please be aware that WiredTiger has no direct visibility of what's happening underneath when mdadm is operating in a degraded state. Therefore, WiredTiger cannot compensate for this issue. Kind regards, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Nic Cottrell (Personal) [ 09/Sep/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I disabled mysqld completely on this machine and the problem remained although eventually the memory usage of mongod crept up. I did in fact have a failed drive inside a mirrored software raid. It was marked as failed inside mdadm and so I don't see why that would cause iowait to jump from a few percent to high 90's. In any case, after replacing he drive the iowait dropped back to near zero. I can't help think there may still be a bug in wiredtiger which doesn't handle a failed software raid as well as it could. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Emil Burzo [ 08/Sep/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
It's a big coincidence that I have the exact same issue after upgrading to MongoDB 3.2.9. Graphs and description, here: https://groups.google.com/forum/#!starred/mongodb-user/42wOHqR_o5Q | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 01/Sep/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I don't think there's enough evidence to indicate a bug in MongoDB, just that something is happening on your system that's slowing MongoDB down, so I'm going to close this ticket – as you know, the SERVER project is for reporting bugs or feature suggestions for the MongoDB server. Looking at the output of pidstat I don't think the higher I/O coming from mysqld is the issue either, so my guess is that your md device may be rebuilding or doing a consistency check. I'd look at /proc/mdstat to find out more. Note that for MongoDB-related support discussion you can post on the mongodb-user group or Stack Overflow with the mongodb tag, where your question will reach a larger audience. See also our Technical Support page for additional support resources. Thanks, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Nic Cottrell (Personal) [ 01/Sep/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Doesn't seem like a lot of activity:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 01/Sep/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Is the I/O coming from mongod? Can you try running pidstat -d 1 and posting the output here? Thanks, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Nic Cottrell (Personal) [ 31/Aug/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Now I'm seeing spikes of 100% cpu for the mongod process | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Nic Cottrell (Personal) [ 31/Aug/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Often in the mongod.log:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Nic Cottrell (Personal) [ 31/Aug/16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
This possibly started just after the 3.2.9-rc0 yum update but continues after the 3.2.9 upgrade and a machine restart. uname -a gives: |