[SERVER-19080] High CPU (user usage in MMS) after bulk inserts Created: 23/Jun/15 Updated: 14/Apr/16 Resolved: 31/Aug/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Admin |
| Affects Version/s: | 3.0.4 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Maziyar Panahi | Assignee: | Sam Kleinman (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Operating System: | ALL |
| Steps To Reproduce: | Start a fresh replica set, |
| Participants: |
| Description |
|
Hello, I have a replica set MongoDB 3.0.4 with stable load (60-100 inserts, 20-30 updates, 10 queries). Sometimes I do bulk inserts like 20K 30K and after this the CPU goes high even though I am done with the operation and there is nothing else. In MMS when I look at the CPU of a fresh MongoDB (just started) everything is normal. But when I use bulk write to insert 20k or 40k or any job that uses cache the CPU usage (specially user usage) goes over 100%. The servers are isolated from any other applications and they only use MongoDB and I check with other monitoring tools it's all MongoDB. When I restart the MongoDB it all goes away. How I start my replica: mongod --storageEngine wiredTiger --wiredTigerCacheSizeGB=2 --dbpath /data/ --replSet rs0 --fork --logpath /data/fork.log I also tried different cache size like 5 or 10 and it was the same. The only thing that looks different before bulk inserts and after bulk inserts is the cache usage in MMS. Stats:
Please let me know if you need more information. Thanks |
| Comments |
| Comment by Maziyar Panahi [ 31/Aug/15 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
MongoDB CPU usage in MMS before and after update to 3.0.6. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Maziyar Panahi [ 31/Aug/15 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Sam, I checked this few times. Even without bulk inserts after awhile the CPU (user) usage would go up to 140-150% on both replica members. But no after last update it's been 24hrs which the usage is moderate around 30% on both. I will attach a screenshots before update and after latest update. (don't mind the time that the usage went down to 0%. We had a power outrage and has nothing to do with MongoDB) I will let you know if it comes back but for now everything is ok and I think we can close this issue | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sam Kleinman (Inactive) [ 03/Aug/15 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Sorry for taking so long to get back to you. While we haven't yet been able to reproduce the behavior you see here, after reviewing this again, it seems plausible that some of the improvements in 3.0.5 may have addressed this issue, particularly the improvements to the performance of capped collections and the oplog. You can see the complete list of changes in 3.0.5 here and the WiredTiger specific changes here. Could you see if this resolves your issues? Regards, | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Maziyar Panahi [ 01/Jul/15 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Sam, grep -c "processor" /proc/cpuinfo = 10 I did stop all the operations and this is the result.
I also attached one screenshot from the time I stopped all the operations. My question is, how come when you shutdown and start an instance it all goes away even though the operation load is the same as few seconds ago? Update: After this I started replica set without mentioning the cache size. The primary is already using 32G cache. I will let you know if there was any change to this problem. Best, | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Maziyar Panahi [ 01/Jul/15 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
When the CPU usage drops is the time I stopped all the operations. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sam Kleinman (Inactive) [ 30/Jun/15 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks for the additional information. WiredTiger is able to utilize more available CPU resources than MongoDB instances using the MMAPv1 engine, so higher CPU usage is expected. I want to make sure that we're not seeing a case where a system is doing no work (i.e. no operations.) and CPU usage stays consistently high. From the data you've provided it seems that the database is still responsive, and is continuing to do work while it's using CPU which might be expected, to some degree. I want to see if we still see the effect with two additional modifications:
Could you also confirm the number of logical CPU cores (e.g. grep -c "processor" /proc/cpuinfo)? Cheers, | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Maziyar Panahi [ 30/Jun/15 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Sam, Sorry for the delay I can't find the way to get notified by email when I am watching an issue. The MMS account is mine and linked to the same email address as this one. This email address is also the MMS username and group owner: maziyar@live.co.uk and group name: zeus-iscpif I had much larger cache which I was told to bring it down in this issue: https://jira.mongodb.org/browse/SERVER-18231 Here is how I do bulk inserts and it's not that often once in a while I do this and immediately the CPU user usage in MMS goes over the 100%:
Right now the CPU (user usage) is over 100% so I will put some stats here:
I also added few screenshots from MMS. Many thanks, | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sam Kleinman (Inactive) [ 24/Jun/15 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The user from which you opened this ticket is not linked to an MMS account: could you either link this account to the relevant MMS group or provide the group name in this ticket so we can track this issue down. The wired tiger cache size controls the full amount of cache size that the entire MongoDB instance has access. It looks like your data and index sizes are much larger than 2 gigabytes. Are you able to reproduce this issue with much smaller data sizes or a much larger cache size? Could you provide more information about the bulk operations that trigger this case. An example operation, including the write concern and the replication configuration, would be very useful. I'm also interested in actual CPU utilization: When you say over 100%, do you mean that MongoDB is using more than 100% of a single logical CPU core? Is the database responsive during this time? Thanks for the additional background. Regards, |