Priority: Major - P3
Affects Version/s: 3.4.13, 3.4.14, 3.6.3
Fix Version/s: None
We have a problem with mongodb (3.6.3) PRIMARY server. After some time it gets to a state where it is still PRIMARY but it is not accepting connections. The problem is that it keeps PRIMARY role and because of that our app crashes. Mongodb restart on PRIMARY server helps and everything backs to normal.
We are hosting mongodb in Amazon on 3 Ubuntu m5.4xlarge instances with 3000 IOPS EBS volumes.
During the crash we have ~30% more connections to MongoDB than usual, but they are still far below the limits and far below fs.file-max setting that is set to 6430188. No other metric looks suspicious. RAM, CPU, Disk and Network usage are on the same level as just before crash and right after restart of PRIMARY. We have already migrate MongoDB from 3.4.14 to 3.6.3 and problem still occurs every 1-2 days. We have also changed priority for PRIMARY server and migrate this role to another host so it’s not connected to any specific machine.
There is nothing interesting on logs.
Here is the output of some commands that we run when the server was in not responsive state:
Any idea what else should we check to debug it?