Loading...

XML

Word

Printable

JSON

Hey guys, we observed the following weird behaviour with the following setup:

All times are UTC

3-member replica set
- two bigger instances for failover - rs1-1 and rs1-2
- one smaller instance for backups

Around 00:31 the primary rs1-1 had a major spike in memory usage.
- this is inferred from "Cannot allocate memory" messages in the syslog of the instance
- based on the mongo logs: there are no heavy running queries at the time
After becoming irresponsive rs1-2 became the new primary and had a similar memory usage spike around 00:37
- again inferred from the syslog
- again no big queries can be seen in the mongo log
Both instances were irresponsive (not able to SSH, not reporting metrics) for a few hours until restarting them a few hours later
Upon restart rs1-1 crashed one more time around 06:44
**After the second crash I scaled up the machines and they have been running OK since then

You can see attached:

Let me know if you need any more information.