[SERVER-39684] Mongod Server slows down and caused high CPU usage Created: 20/Feb/19 Updated: 21/May/19 Resolved: 21/Feb/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 3.2.21, 3.2.22 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Kevin Supendi | Assignee: | Eric Sedor |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Debian 8 |
||
| Attachments: |
|
| Operating System: | ALL |
| Participants: |
| Description |
|
Dear All, We have a Replica Set with 12 nodes total, 6 nodes v3.2.21, 6 nodes v3.2.22, and 1 hidden member. We have 40k connections from our clients, and we cannot reduce this number at the moment. The problem is at some time of the day (usually in the early morning), mongod will become very slow, even isMaster command will takes ~200 ms. If we check our Kibana slow logs, most queries have high timeAcquiringMicros. If we does not restart the mongod, it will result in unreachable secondary, high CPU load and Context Switches, and crashed the server. Other metrics like Memory, Disk Usage are normal during the mongod slowdown. We checked slow logs and error logs, but we haven't found any suspicious queries that can cause this, so our suspect is mongod becomes slow after there are some lock contention. We still do not know what is the root cause of this lock contention though. This happens almost everyday, we restart one or two nodes every morning because of high CPU and server crash. We would like to know if we could see what kind of locks are frequently used, or what are the possible causes of this kind of behavior. Any help would be appreciated. PS : We still use 2.6 credentials, does it matter? |
| Comments |
| Comment by Evan Zhao [ 21/May/19 ] |
|
Hello Kevin , I have recently encountered the same problem. We can only monitor the logs every day. When 'the server is very slow' appears, we restart mongod. Now the frequency is about once a week.Have you solved this problem? Any help would be appreciated. |
| Comment by Eric Sedor [ 21/Feb/19 ] |
|
My apologizes Kevin, I overlooked the MongoDB version you provided initially. MongoDB 3.2 reached end of life in September of 2018 and the SERVER project is for bugs and feature requests for current versions. We aren't able to help you diagnose this deployment here. For MongoDB-related support discussion please post on the mongodb-user group or Stack Overflow with the mongodb tag. A question like this involving more discussion would be best posted on the mongodb-user group. See also our Technical Support page for additional support resources. |
| Comment by Kevin Supendi [ 21/Feb/19 ] |
|
Oh i want to add, we set Db profiling level to 1, and slow log threshold 200 ms |
| Comment by Kevin Supendi [ 21/Feb/19 ] |
|
Hello Eric, thank you for your response. Yesterday, it happened again, 5 nodes were down. Let's call them node A (primary), B, C, D and E. They went down approximately between 20 Feb 2019 05:30:00 to 06:00:00 GMT+7.
I couldn't attach their files individually because they exceed 150 MB file upload limit, so I put the zip (1.1GB) in this GDrive link. https://drive.google.com/file/d/1YyyzSyQWFJdTAvkspnEBK0XUh-4m-mWl/view?usp=sharing
The zip file consists of 5 diagnostic.data zip for each server and a day of mongod logs for each server. We hope that you can help us pinpoint the problem in our production setup. Thank you for your time. |
| Comment by Eric Sedor [ 21/Feb/19 ] |
|
Hello, would you please archive (tar or zip) the $dbpath/diagnostic.data directory and attach it to this ticket? A day of server logs that cover one or two incidents will also be helpful. |