Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-82398

Excessive memory and CPU consumption during normal operation

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • Environment:
      Ubuntu 18.04.6 LTS
      XSF
      Kernel - 5.4.0-1088-aws #96~18.04.1-Ubuntu SMP Mon Oct 17 02:57:48 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
      Disable Transparent Huge disabled
      AWS m5.2xlarge
      SSD GP3 450 Gb
    • Server Triage
    • ALL

      Hi!

      On one of our shards at some point in time the consumed memory of the primary replica started to rapidly grow along with high CPU consumption. Then that replica became unresponsive, and consequently another replica became the primary. Right after that the same happened to the new primary.

      The incident timeline:

      1. 10/24/23 7:40 - beginning (peak in CPU and memory consumption)
      2. 10/24/23 8:20-8:26 (can't say exact time) - the primary (replica-1) becomes unresponsive, another replica (replica-2) becomes the new primary, and we see peak in CPU and memory consumption again
      3. 10/24/23 8:38 - the new primary (replica-2) becomes unresponsive, another replica (replica-1) becomes the new primary
      4. 10/24/23 8:43 - the replica (replica-3) that didn't appear to ever assume the primary role starts experience the same problems with CPU and memory
      5. 10/24/23 9:20 - we manually restart replica-3, the incident ends

      Unfortunately, we couldn't get to the core of the problem, but here some things we could observe:

      • we noticed that the amount of open cursors jumped up to 500 at the replicas mentioned above (we use Change Streams, so it might be related)
      • On replica-3 there were dozens of "hanging" aggregation commands (in secs_running we saw pretty big numbers, like 2000 seconds)

      Could you help us identify the cause of the problem?

      I'm attaching the diagnosting data of the aforementioned replicas (I named the files with replica-1, replica-2 and replica-3, these names correspond to the replica numbers mentioned above).

            Assignee:
            yuan.fang@mongodb.com Yuan Fang
            Reporter:
            vladimirred456@gmail.com Vladimir Beliakov
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: