[SERVER-25700] Very high CPU usage Created: 19/Aug/16 Updated: 27/Sep/16 Resolved: 26/Aug/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Stability, WiredTiger |
| Affects Version/s: | 3.2.9 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Teemu Sirkiä | Assignee: | Kelsey Schubert |
| Resolution: | Incomplete | Votes: | 1 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Operating System: | ALL | ||||||||
| Participants: | |||||||||
| Description |
|
I updated Mongo from version 3.2.5 to 3.2.9 and after that Mongo has freezed my system by using all the CPU time. There is nothing in Mongo's own logs but I can see that my own process has done both times a number of updates. But there has not been anything special with those updates as similar updates are done every 10 seconds. There was about 24 hours between these problems so it won't occur very frequently. However, I had to downgrade Mongo back to 3.2.5 to see if this issues goes away. I tried to search any open issues but I couldn't find anything relevant. As this is my production server and there is not much information about what is wrong, this might be quite hard to debug what is happening. |
| Comments |
| Comment by Kelsey Schubert [ 27/Sep/16 ] |
|
Hi colinhowe, We've made a significant number of fixes in MongoDB 3.2.10, which we expect to release early next week, and we have heard that users are seeing substantial improvements (see Thank you for your help, |
| Comment by Kelsey Schubert [ 27/Sep/16 ] |
|
To watchers of this ticket, emilburzo opened |
| Comment by Colin Howe [ 16/Sep/16 ] |
|
Sorry I only just saw this message and the dates on the diagnostics indicate that they no longer have the bad period |
| Comment by Kelsey Schubert [ 08/Sep/16 ] |
|
Thank you for letting us know that you are encountering a similar issue. To help us diagnose this behavior, would you please each create a new ticket and attach an archive of your diagnostic.data directory? Thank you, |
| Comment by Emil Burzo [ 08/Sep/16 ] |
|
Another me too post. I won't copy/paste as I've already written about it here: https://groups.google.com/forum/#!starred/mongodb-user/42wOHqR_o5Q On the host with regular HDDs (not SSD) IOWAIT is very significant. |
| Comment by Colin Howe [ 07/Sep/16 ] |
|
Hi, We upgraded from 3.2.7 to 3.2.9 and saw similar problems to those described here. We've now downgraded to 3.2.8 and CPU usage is about 25% of what it was. Not only that, but, we were seeing problems with available tickets dropping to 0 and all queries taking ~10x as long as normal. If you want to see this in MMS - https://cloud.mongodb.com/v2/56375ad1e4b09259595a25b3#host/replicaSet/56375cd6e4b007ecfe71df99 Something isn't right. |
| Comment by Kelsey Schubert [ 26/Aug/16 ] |
|
Hi ttsirkia, Unfortunately, after examining the logs and diagnostic.data we have not been able to determine the cause of this behavior. I expect that the issue you have encountered is hindering our ability to collect the necessary diagnostic metrics. If you are able to reproduce this issue please let us know and we will reopen this ticket and work with you to collect additional information to help us debug the problem. Please be aware that we are actively working to improve the performance of WiredTiger across a variety of workloads and system architectures. It is likely that a future version of MongoDB will resolve this issue. Unfortunately, at this time, I cannot point you towards a specific ticket to watch for updates. The behavior you are observing appears to be related to cache eviction. To help our investigation, would you please open a new ticket and attach an archive of $dbpath/diagnostic.data directory to it? Thank you, |
| Comment by Teemu Sirkiä [ 24/Aug/16 ] |
|
It is kind of nice to hear that I'm not the only one having this issue and you Ravi can also confirm it. The update queries might be the key of the issue as in my application, most of the queries are updates. Inserts and deletes are much more seldom. Fetching data of course occurs but not as much as updates. -Teemu |
| Comment by Ravi Teja [ 24/Aug/16 ] |
|
Hi All, Our cpu utilization is as showed below hourly basis of each day and random spikes at end is related to upgrading mongo Thanks |
| Comment by Teemu Sirkiä [ 23/Aug/16 ] |
|
Hi! I appriciate your work. I just uploaded the relevant Mongo logs. Typically, top shows CPU usage between 0.3 - 2%. After these issues, I wasn't able to login anymore to the server but the virtualization monitor reported that the virtual engine running Mongo took 30% of the whole processor capacity. So I suspect that the CPU usage at that point was 100%. As I wasn't able to login, I cannot be sure that it was Mongo's process that was using the CPU. However, the problem is still very tightly related to this update because the problems disappeared after downgrading. My system is running Node.js and using Mongoose to communicate with the database. I tried to search Mongoose and MongoDB driver bug trackers but found nothing interesting. I would be happy to able to replicate this in my testing environment. If this occurred in that environment, I could incrementally upgrade Mongo towards 3.2.9 to see which version actually causes the problems. It is not very feasible option to use the production server for debugging purposes, unfortunately. -Teemu |
| Comment by Kelsey Schubert [ 23/Aug/16 ] |
|
Hi ttsirkia, Thanks for providing the diagnostic.data. Unfortunately, we have not been able to determine the cause of this behavior yet. To continue to investigate, would you please upload the complete logs to the same portal covering the time period that this issue occurred? Additionally, would you please clarify how much cpu is typically utilized by the mongod instance? Thanks again, |
| Comment by Teemu Sirkiä [ 21/Aug/16 ] |
|
Some additional information. It only took about 10 hours when the first issue occurred after I updated to 3.2.9. And the second occurred 24 hours after that. Now after downgrading back to 3.2.5, the system has been running without any problems over 60 hours. At the same time, I've tried to replicate the issue in my testing environment by using Mongo 3.2.9. However, no problems have occurred. The background processes are the same in the testing environment as in the production environment. The main difference is that the number of queries is much smaller because there are no users using the system. This might indicate that it could be some kind of race condition that occurs at certain point when the database is used by the background processes and the users via web interface. I hope the diagnostics file bring up something interesting. Meanwhile, I'm running the older version and not going to upgrade. |
| Comment by Teemu Sirkiä [ 19/Aug/16 ] |
|
Thanks! I uploaded all the files. The first issue occurred yesterday, Aug 18th, around 09:40:15 EEST and the second today, Aug 19th, around 11:03:35 EEST. Mongo was updated to version 3.2.9 on Aug 17th around 23:00 EEST and then downgraded back to 3.2.5 today after the issue. BR, |
| Comment by Kelsey Schubert [ 19/Aug/16 ] |
|
Hi ttsirkia, Providing the complete directory is preferable as it would give us more context around the issue. I've created a secure upload portal for you to use - would you please upload the diagnostic.data there? Thank you for your help, |
| Comment by Teemu Sirkiä [ 19/Aug/16 ] |
|
Sure! It is 98 mb, is it OK to drop some of the older files? BR, |
| Comment by Kelsey Schubert [ 19/Aug/16 ] |
|
Hi ttsirkia, Would you please archive (tar or zip) the $dbpath/diagnostic.data directory and attach it to this ticket, so we can continue to investigate this issue? Thank you, |
| Comment by Teemu Sirkiä [ 19/Aug/16 ] |
|
My environment is Linux 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |