Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-45315

When replica-set member goes offline, other members CPU`s spike to 100%

    • Type: Icon: Bug Bug
    • Resolution: Community Answered
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Performance, Replication
    • Labels:
      None
    • Fully Compatible
    • ALL

      TL;DR: 

      Powering off one of mongodb Shard members cause the others cpu`s to raise for 100%.

      Background: 

      I want to deploy an mongodb cluster on several ESXes. The cluster have to resist two component shutdown.

      Cluster Architecture (Mongo 4.2):

      • 5 config servers
      • 3 query servers
      • shard01:
        • primary
        • 2 secondary
        • 2 arbiter
      • shard02:
        • primary
        • 2 secondary
        • 2 arbiter

      The problem:

      Whenever I have been testing HA by removing one of the members. I noticed, after several minutes, that the rest of the members face to CPU spike to 100% which remains until I returned the missing member.

      Tests I have been conducted:

      1. shutdown 1 replica -> members CPU raise to 100%
      2. shutdown 1 replica and 1 arbiter -> members CPU raise to 100%
      3. shutdown 1 arbiter -> members are OK

      Things i have already checked:

      • When checking the problematic VMs I noticed that the mongod is the service which consume most of the CPU (99%).
      • I checked mongod for long run-time queries with db.currentOp(). Everything looks just fine.
      • Mongod.log does not contain any suspicious logs.

      Bbottom_line:

      I published the problem in [stackoverflow |https://stackoverflow.com/questions/59491006/why-when-one-of-mongodb-replica-set-shard-members-goes-offline-the-others-cpus] and advised to report it as a bug. 

      Regards,

      Aric

            Assignee:
            dmitry.agranat@mongodb.com Dmitry Agranat
            Reporter:
            naheim.lavon@opka.org Arik Nano
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: