[SERVER-20927] 100% CPU on mongo 3.0.4 Created: 14/Oct/15  Updated: 09/Jan/16  Resolved: 09/Jan/16

Status: Closed
Project: Core Server
Component/s: Admin
Affects Version/s: 3.0.4
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Mike Bartlett Assignee: Unassigned
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File iostat.log     Text File ss-10s.log     Text File ss-1s.log     Text File ss.log    
Operating System: ALL
Steps To Reproduce:

Run mongo for a while, then it breaks, lose confidence.

Participants:

 Description   

Hi folks,

We previously commented on https://jira.mongodb.org/browse/SERVER-19485 and realise this may be fixed in subsequent versions, but thought it wise to report it nontheless and attach ss.log. I didn't run it very long as I had to stepDown the server as this is an operating production environment and things were pretty borked during the 100% cpu spike.

As soon as the stepDown occurred, the CPU dropped back down to normal operating parameters.



 Comments   
Comment by Ramon Fernandez Marina [ 23/Nov/15 ]

mydigitalself, in the latest date you uploaded I see peaks of over 400K operations per second, and up to 11K connections created per seconds, so it could be that this server is not powerful enough to handle the load it's being subjected to. In addition, there have been numerous fixes since version 3.0.4, so if this is still an issue for you can you please try MongoDB 3.0.7 and let us know if the issue persists?

Thanks,
Ramón.

Comment by Mike Bartlett [ 14/Oct/15 ]

So it started happening on the new primary a few mins after election, but only very briefly.

ss-10s.log should capture the 100% cpu

i then noticed your instructions in the previous bug report for the iostat log and so dropped it down to 1s for the serverStatus and included the iostat.log but it appeared to have dropped down to regular CPU parameters either just before or perhaps during this period.

Comment by Mike Bartlett [ 14/Oct/15 ]

What I'd really love to know is why the replicaset doesn't pick this up and re-elect, because clearly the primary is unhealthy in this state.

Generated at Thu Feb 08 03:55:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.