Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-42493

Replica set crashes



    • Bug
    • Status: Closed
    • Major - P3
    • Resolution: Works as Designed
    • 3.4.16
    • None
    • Replication, Stability
    • None
    • Fully Compatible
    • ALL


      Hey guys, we observed the following weird behaviour with the following setup:

      All times are UTC

      • 3-member replica set
        • two bigger instances for failover - rs1-1 and rs1-2
        • one smaller instance for backups
      1. Around 00:31 the primary rs1-1 had a major spike in memory usage.
        • this is inferred from "Cannot allocate memory" messages in the syslog of the instance
        • based on the mongo logs: there are no heavy running queries at the time
      2. After becoming irresponsiveĀ rs1-2 became the new primary and had a similar memory usage spike around 00:37
        • again inferred from the syslog
        • again no big queries can be seen in the mongo log
      3. Both instances were irresponsive (not able to SSH, not reporting metrics) for a few hours until restarting them a few hours later
      4. Upon restartĀ rs1-1 crashed one more time around 06:44
      5. **After the second crash I scaled up the machines and they have been running OK since then

      You can see attached:

      • mongo logs from both servers
      • diagnostics.data from both servers

      Let me know if you need any more information.


        1. rs1-1-diagnostics.tar
          47.37 MB
        2. rs1-1-mongo-log
          1.80 MB
        3. rs1-2-diagnostics.tar
          34.11 MB
        4. rs1-2-mongo-log
          733 kB



            dmitry.agranat@mongodb.com Dmitry Agranat
            adamof Stefan Adamov
            0 Vote for this issue
            3 Start watching this issue