Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-31457

Mongod stop responding, takes 200 load and don't even switch to secondary

    • Type: Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.4.5
    • Component/s: Stability, WiredTiger
    • Fully Compatible
    • ALL
    • Hide

      Impossible to know of course

      Impossible to know of course


      I just suffered a quite bad issue with mongod 3.4.5 (WT), the requests were totally normal, nothing out of the ordinary and suddenly it started taking up to 200 load on my server, and all 8 CPUs of course:

      At this point the server stopped responding to any request, but it seems it kept pinging the secondaries and syncing as it stayed primary until I manually changed the priory from the secondary (I couldn't even SSH on the primary as it was killing the machine).

      As I had numerous problem of the kind in the past due tu various performance issues in WT, cache eviction, etc. (SERVER-27700) I tried to let it rest see if it recovers but after 3 hours had to hard reboot the server to get it back...

      I checked the logs after the reboot and there was just no single line of log during the 3 hours, and the ones before the crash have nothing weird to me. I collected the diagnostic dir, I can give it to you (and the last hour of logs) if you send me your usual upload link.

      If you can access my MongoDB Cloud Manager stats, the project id is: 5012a0ac87d1d86fa8c22e64 otherwise I can give you some screenshots, but there's nothing very interesting as these charts were all totally normal until the agent stopped collecting data.

      Thanks for your help

        1. mongodb-incident-1.png
          24 kB
        2. mongodb-incident-2.png
          20 kB
        3. cpu-time.png
          22 kB
        4. disk-usage.png
          27 kB

            bruce.lucas@mongodb.com Bruce Lucas (Inactive)
            bigbourin@gmail.com Adrien Jarthon
            0 Vote for this issue
            11 Start watching this issue