Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-31457

Mongod stop responding, takes 200 load and don't even switch to secondary

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Cannot Reproduce
    • Affects Version/s: 3.4.5
    • Fix Version/s: None
    • Component/s: Stability, WiredTiger
    • Labels:
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Steps To Reproduce:
      Hide

      Impossible to know of course

      Show
      Impossible to know of course

      Description

      Hello,

      I just suffered a quite bad issue with mongod 3.4.5 (WT), the requests were totally normal, nothing out of the ordinary and suddenly it started taking up to 200 load on my server, and all 8 CPUs of course:

      At this point the server stopped responding to any request, but it seems it kept pinging the secondaries and syncing as it stayed primary until I manually changed the priory from the secondary (I couldn't even SSH on the primary as it was killing the machine).

      As I had numerous problem of the kind in the past due tu various performance issues in WT, cache eviction, etc. (SERVER-27700) I tried to let it rest see if it recovers but after 3 hours had to hard reboot the server to get it back...

      I checked the logs after the reboot and there was just no single line of log during the 3 hours, and the ones before the crash have nothing weird to me. I collected the diagnostic dir, I can give it to you (and the last hour of logs) if you send me your usual upload link.

      If you can access my MongoDB Cloud Manager stats, the project id is: 5012a0ac87d1d86fa8c22e64 otherwise I can give you some screenshots, but there's nothing very interesting as these charts were all totally normal until the agent stopped collecting data.

      Thanks for your help

        Attachments

        1. mongodb-incident-1.png
          mongodb-incident-1.png
          24 kB
        2. mongodb-incident-2.png
          mongodb-incident-2.png
          20 kB
        3. cpu-time.png
          cpu-time.png
          22 kB
        4. disk-usage.png
          disk-usage.png
          27 kB

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                11 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: