• Type: Icon: Bug Bug
    • Resolution: Works as Designed
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.4.16
    • Component/s: Replication, Stability
    • Labels:
      None
    • Fully Compatible
    • ALL

      Hey guys, we observed the following weird behaviour with the following setup:

      All times are UTC

      • 3-member replica set
        • two bigger instances for failover - rs1-1 and rs1-2
        • one smaller instance for backups
      1. Around 00:31 the primary rs1-1 had a major spike in memory usage.
        • this is inferred from "Cannot allocate memory" messages in the syslog of the instance
        • based on the mongo logs: there are no heavy running queries at the time
      2. After becoming irresponsiveĀ rs1-2 became the new primary and had a similar memory usage spike around 00:37
        • again inferred from the syslog
        • again no big queries can be seen in the mongo log
      3. Both instances were irresponsive (not able to SSH, not reporting metrics) for a few hours until restarting them a few hours later
      4. Upon restartĀ rs1-1 crashed one more time around 06:44
      5. **After the second crash I scaled up the machines and they have been running OK since then

      You can see attached:

      • mongo logs from both servers
      • diagnostics.data from both servers

      Let me know if you need any more information.

        1. rs1-1-diagnostics.tar
          47.37 MB
        2. rs1-1-mongo-log
          1.80 MB
        3. rs1-2-diagnostics.tar
          34.11 MB
        4. rs1-2-mongo-log
          733 kB

            Assignee:
            dmitry.agranat@mongodb.com Dmitry Agranat
            Reporter:
            adamof Stefan Adamov
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: