[SERVER-22923] iowait increases slightly after upgrading from MongoDB 3.0 to 3.2 Created: 02/Mar/16 Updated: 15/Sep/16 Resolved: 14/Sep/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | WiredTiger |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor - P4 |
| Reporter: | Maziyar Panahi | Assignee: | Kelsey Schubert |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Operating System: | ALL |
| Participants: |
| Description |
| Comments |
| Comment by Kelsey Schubert [ 14/Sep/16 ] | ||||||||||||||||||||||||||||||||||||
|
As Ramon explained, a lot of work has been completed to improve the performance of MongoDB and WiredTiger in 3.2. For example, on secondaries we expect that there may be an increase in I/O load as part of the way we sync journal files, this load should self-throttle to keep performance from degrading. From our analysis, the increase of iowait observed in this ticket is minor and does not impact performance. Therefore, I'm closing this ticket. Kind regards, | ||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 21/Jun/16 ] | ||||||||||||||||||||||||||||||||||||
|
After looking at the data we haven't found anything abnormal, and the increase in iowait is really minimal. There was a significant amount of work that went into WiredTiger in 3.2, so it will be very hard to pinpoint the exact change that caused this behavior. Since we haven't seen any negative impact from this increase in iowait over all I'm lowering the priority of this ticket, but will keep it open to attempt to reproduce this behavior locally and investigate further as time permits. | ||||||||||||||||||||||||||||||||||||
| Comment by Ilya Skriblovsky [X] [ 11/May/16 ] | ||||||||||||||||||||||||||||||||||||
|
> As I understand, the increase in iowait is the only issue you have observed I've attached files that you have requested. For both 3.0 and 3.2 I've ran scripts after more than 6 hours of operation, when mongod reached it's usual resident memory size. | ||||||||||||||||||||||||||||||||||||
| Comment by Kelsey Schubert [ 10/May/16 ] | ||||||||||||||||||||||||||||||||||||
|
Hi IlyaSkriblovsky, Thank you for reporting your observations. As I understand, the increase in iowait is the only issue you have observed when MongoDB was upgraded to 3.2. To continue to investigate this behavior, we will need additional information. Would you please follow the steps below? With MongoDB 3.0 running:
After running this script for an hour, please attach the following to this ticket
With MongoDB 3.2 running:
This will collect iostat data each second, and will help us to correlate the CPU numbers to events recorded in the diagnostic.data. After running this script for an hour, please attach the following to this ticket
Thank you, | ||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 06/May/16 ] | ||||||||||||||||||||||||||||||||||||
|
Reopening to take a closer look at recently uploaded data. | ||||||||||||||||||||||||||||||||||||
| Comment by Ilya Skriblovsky [X] [ 06/May/16 ] | ||||||||||||||||||||||||||||||||||||
|
I've reverted back to 3.0.11 to see whether iowait will drop back to lower values. Here are the results (be sure to view full images, they are wide and may be truncated):
So yes, iowait is low again. But Cache Activity graph doesn't became like it was before 3.0→3.2 upgrade. So, decreased cache activity is probably not related to a version of MongoDB, but simply my DB needs much more time to warm up the cache. Fortunately, I can switch between 3.0 and 3.2 relatively easy. So Please let me know if some diagnostics info from 3.0 or 3.2 might be useful. Please consider reopening this ticket because higher iowait certainly needs some explanation (or better fixing!). | ||||||||||||||||||||||||||||||||||||
| Comment by Ilya Skriblovsky [X] [ 06/May/16 ] | ||||||||||||||||||||||||||||||||||||
|
Hi all, I have an exactly same effect after an upgrade from 3.0.11 to 3.2.6: Increased iowait is seen not only at Cloud Manager's graphs, but also at "wa" percentage in `top` and at load average that doubled after an upgrade. Primary configuration:
diagnostics.data.zip: http://microline.ru/img/mongodb-jira/diagnostic.data.zip Even if MongoDB 3.2 works as expected, it seems like considerable issue and I thinking of downgrading back to 3.0 due to this effect. May be this is caused by some caching policy change of 3.2? Are there any relevant configuration parameters that can be tuned? | ||||||||||||||||||||||||||||||||||||
| Comment by Kelsey Schubert [ 06/Apr/16 ] | ||||||||||||||||||||||||||||||||||||
|
Hi maziyar, From the data you have uploaded we do not see anything to indicate a bug in MongoDB. When you account for the 10 cores on your machine, iowait peaked at about 3% of total cpu activity. This value is falls within reasonable operating levels. The log you have provided indicates that you may be have run into Thank you, | ||||||||||||||||||||||||||||||||||||
| Comment by Maziyar Panahi [ 21/Mar/16 ] | ||||||||||||||||||||||||||||||||||||
|
I forgot to mention, after upgrade the secondary just dies with this error and I have to start the mongd again. Not sure if this error is related to the iowait or the fact that I always see jbd2/vdb-8 (iotop) in my mongodb instance with over 40% IO:
Many thanks. | ||||||||||||||||||||||||||||||||||||
| Comment by Maziyar Panahi [ 21/Mar/16 ] | ||||||||||||||||||||||||||||||||||||
|
Hi Ramon, Yes the iowait still up. I uploaded a new screenshot. Thanks again Ramon, Best, | ||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 14/Mar/16 ] | ||||||||||||||||||||||||||||||||||||
|
maziyar, if I understand the data correctly, it seems that the CPU stats climbed up initially, and then everything went back to normal except for iowait: Interestingly enough there's less data going into the cache: I'm still looking at the data you uploaded to see if this change in behavior can be explained from that. | ||||||||||||||||||||||||||||||||||||
| Comment by Maziyar Panahi [ 04/Mar/16 ] | ||||||||||||||||||||||||||||||||||||
|
iostats from both primary and secondary plus the diagnostics data from both mongodb instances | ||||||||||||||||||||||||||||||||||||
| Comment by Maziyar Panahi [ 04/Mar/16 ] | ||||||||||||||||||||||||||||||||||||
|
Hi Thomas, Thanks for the reply. I am running the script and will upload iostat for both instances plus the diagnostic directory. Thanks again, | ||||||||||||||||||||||||||||||||||||
| Comment by Kelsey Schubert [ 02/Mar/16 ] | ||||||||||||||||||||||||||||||||||||
|
maziyar, please also run the following the shell script:
This will collect iostat data each second, and will help us to correlate the CPU numbers to events recorded in the diagnostic.data. After running this script for an hour, please upload both the iostat.log and diagnostic.data to this ticket. Thanks again, | ||||||||||||||||||||||||||||||||||||
| Comment by Kelsey Schubert [ 02/Mar/16 ] | ||||||||||||||||||||||||||||||||||||
|
Hi maziyar, MongoDB 3.2 introduced a diagnostic data collection mechanism that logs server statistics at periodic intervals. To get a better idea of what is going on, can you please archive $dbpath/diagnostic.data directory and attach it to this ticket? Thank you, |