-
Type: Question
-
Resolution: Done
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
Hi,
we're running a 3-node replica set with MongoDB version 4.0.1. Until recently we have been running the same data set on a replica set with version 2.4 and we have seen the same issue.
Once in a while we suddenly see load spiking on the primary node and active reads piling up. See attached screenshot from our Grafana dashboard. When this happens, the cluster is unable to answer queries at all - the short-hand solution is to either rs.StepDown() or restart the mongod on the primary completely.
We want to ask for input on how to go from here to debug this. We couldn't spot a query yet which seems suspect to cause this. The replica set was running fine for years before the issue first appeared a few month ago and we're unsure what is causing this.
Attached are MongoDB metrics and host metrics where the problem can be seen.
Thanks!