Details
-
Bug
-
Resolution: Done
-
Major - P3
-
None
-
3.2.21, 3.2.22
-
None
-
None
-
Debian 8
-
ALL
Description
Dear All,
We have a Replica Set with 12 nodes total, 6 nodes v3.2.21, 6 nodes v3.2.22, and 1 hidden member. We have 40k connections from our clients, and we cannot reduce this number at the moment.
The problem is at some time of the day (usually in the early morning), mongod will become very slow, even isMaster command will takes ~200 ms. If we check our Kibana slow logs, most queries have high timeAcquiringMicros. If we does not restart the mongod, it will result in unreachable secondary, high CPU load and Context Switches, and crashed the server. Other metrics like Memory, Disk Usage are normal during the mongod slowdown. We checked slow logs and error logs, but we haven't found any suspicious queries that can cause this, so our suspect is mongod becomes slow after there are some lock contention. We still do not know what is the root cause of this lock contention though.
This happens almost everyday, we restart one or two nodes every morning because of high CPU and server crash. We would like to know if we could see what kind of locks are frequently used, or what are the possible causes of this kind of behavior.
Any help would be appreciated.
Thank you.
PS : We still use 2.6 credentials, does it matter?