[SERVER-51684] Every 4 hours CPU usage is going high varying from 100% to 200% because of mongo SECONDARY / replica node reciding on server Created: 16/Oct/20 Updated: 28/Oct/20 Resolved: 28/Oct/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Networking, Performance, Replication |
| Affects Version/s: | 4.0.5 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Prakhar Makhija | Assignee: | Edwin Zhou |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Operating System: | ALL |
| Participants: |
| Description |
|
2020-10-16T00:06:57.310+0000 I NETWORK [LogicalSessionCacheRefresh] Successfully connected to mongo1.production.internal:27017 (527 connections now open to mongo1.production.internal:27017 with a 0 second timeout) db version v4.0.5 mongo --version git version: 3739429dd92b92d1b0ab120911a23d50bf03c412
|
| Comments |
| Comment by Prakhar Makhija [ 27/Oct/20 ] |
|
Hi @edwin.zhou@mongodb.com Thanks for sharing the analysis, and guiding me in the right direction Will look into it Regards, |
| Comment by Edwin Zhou [ 23/Oct/20 ] |
|
Thank you for sharing your diagnostics with us. After some investigating, we suspect that an external process is increasing your system's reported CPU load. Here is an analysis of the timeline as evidence of this:
Point A System memory and CPU usage begins to increase at exactly 00:00 indicated by system memory anon. The mongod does not perform scheduled tasks like this. As system memory increases, memory allocated by the mongod stays unchanged as indicated by current_allocated_bytes. There is also an increase in CPU iowait, which isn't exactly CPU usage. Instead, this means that the CPU is needing to wait for i/o, as confirmed by the disk usage. However, looking at block-manager read/write, the mongod isn't doing more i/o. Point B Resident memory used by the mongod begins to decrease but allocated memory remains unchanged. We suspect an external process has allocated memory which caused paging and memory used by the mongod is forced out. Point C The mongod does not perform any tasks that occur at an exact time, i.e. exactly at midnight and every 4 hours. Since this appears to be a scheduled task, you should check to see if there are any running cron jobs that may be affecting your system. You can also run ps or top during the times that CPU iowait is high to identify running processes that increase load. Best, |
| Comment by Prakhar Makhija [ 20/Oct/20 ] |
|
Hi @Bruce Lucas @Eric Sedor My bad, the link was not public then, it is now _____________________________________________ Regarding TimeZones I fall in IST (GMT+5:30) The SNMP alert email specifies CEST (GMT+2:00) The mongod server is residing in Europe (Ireland) - Amazon Region (eu-west-1) _____________________________________________ SNMP Email Alerts Date/Time: Tue Oct 20 14:09:03 CEST 2020 Date/Time: Tue Oct 20 10:10:51 CEST 2020 Date/Time: Tue Oct 20 02:13:28 CEST 2020 Date/Time: Mon Oct 19 22:10:16 CEST 2020 Date/Time: Mon Oct 19 18:12:04 CEST 2020 Date/Time: Mon Oct 19 14:08:52 CEST 2020 Date/Time: Mon Oct 19 10:10:42 CEST 2020 Date/Time: Mon Oct 19 06:12:26 CEST 2020 Date/Time: Mon Oct 19 02:14:10 CEST 2020 Date/Time: Sun Oct 18 22:10:59 CEST 2020 Date/Time: Sun Oct 18 18:12:44 CEST 2020 Date/Time: Sun Oct 18 14:09:30 CEST 2020 Date/Time: Sun Oct 18 10:11:19 CEST 2020 Date/Time: Sun Oct 18 06:13:05 CEST 2020 _____________________________________________ Server Mongo Logs |
| Comment by Bruce Lucas (Inactive) [ 20/Oct/20 ] |
|
Hi pmakhija@ask-fast.com, thanks for collecting the data. Unfortunately the link you provided produces an access denied error when we try to use it. Can you please attach the data to this ticket? If it is too large to attach, you can upload it to this secure private upload portal. Also, can you please provide a timeline of your issue: which node is seeing the problem at what specific time (including timezone)? |
| Comment by Prakhar Makhija [ 20/Oct/20 ] |
|
Hi @eric.sedor , please find the tarball attached Feel free to let me know if anything else is needed Thanks, Prakhar |
| Comment by Eric Sedor [ 19/Oct/20 ] |
|
Hi pmakhija@ask-fast.com, can you please provide more information about the issue you are reporting? High CPU use could have many causes. As well, would you please archive (tar or zip) the $dbpath/diagnostic.data directory (the contents are described here) and attach it to this ticket? Thanks, |