Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-71246

Excessive CPU consumption of newly 5.0 upgraded instance

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • Environment:
      Ubuntu 18.04.6 LTS
      XSF
      Kernel - 5.4.0-1088-aws #96~18.04.1-Ubuntu SMP Mon Oct 17 02:57:48 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
      Disable Transparent Huge disabled
      AWS m5.2xlarge
      SSD GP3 450 Gb
    • ALL

      We're in the process of upgrading our MongoDB cluster from 4.4.15 to 5.0.10.
      According to the upgrade documentation we stopped the balancer, then upgraded the config replica set.
      Then we started upgrading the shards. After upgrading 3 out of 4 replicas we were ready to move primary onto already upgraded MongoDB instances.
      Everything was ok for 9 of 10 shards.
      However for the biggest shard (it contains most of the unsharded data) we ran into excessive CPU consumption. We tried to restart the mongod process, but to no avail.
      Going over the logs didn't give us much insight regarding why the process was consuming so much CPU.
      There were many "Slow queries" log records, which makes sense. Also we there were many "WriteConflict" log records like this one:

      {"t":

      {"$date":"2022-11-07T09:08:24.320+00:00"}

      ,"s":"W", "c":"COMMAND", "id":23802, "ctx":"conn3798","msg":"Plan executor error during findAndModify","attr":{"error":

      {"code":112,"codeName":"WriteConflict","errmsg":"WriteConflict error: this operation conflicted with another operation. Please retry your operation or multi-document transaction."}

      ,"stats":{"stage":"UPDATE","nReturned":0,"executionTimeMillisEstimate":1,"works":14,"advanced":0,"needTime":12,"needYield":1,"saveState":1,"restoreState":1,"failed":true,"isEOF":1,"nMatched":0,"nWouldModify":0,"nWouldUpsert":0,"inputStage":{"stage":"LIMIT","nReturned":1,"executionTimeMillisEstimate":1,"works":13,"advanced":1,"needTime":12,"needYield":0,"saveState":2,"restoreState":1,"isEOF":1,"limitAmount":1,"inputStage":{"stage":"FETCH","nReturned":1,"executionTimeMillisEstimate":1,"works":13,"advanced":1,"needTime":12,"needYield":0,"saveState":2,"restoreState":1,"isEOF":0,"docsExamined":1,"alreadyHasObj":0,"inputStage":{"stage":"SORT_MERGE","nReturned":1,"executionTimeMillisEstimate":1,"works":13,"advanced":1,"needTime":12,"needYield":0,"saveState":2,"restoreState":1,"isEOF":0,"sortPattern":{"

      {REDACTED}":1,"{REDACTED}

      ":1},"dupsTested":1,"dupsDropped":0,"inputStages":[{"stage":"IXSCAN","nReturned":1,"executionTimeMillisEstimate":0,"works":1,"advanced":1,"needTime":0,"needYield":0,"saveState":2,"restoreState":1,"isEOF":0,"keyPattern":
      {"REDACTED"}

      ,"indexName":"
      {REDACTED}

      ","isMultiKey":false,"multiKeyPaths":{"

      {REDACTED}

      ":[],"
      {REDACTED}

      ":[],"{REDACTED}

      ":[]},"isUnique":false,"isSparse":false,"isPartial":false,"indexVersion":2,"direction":"forward","indexBounds":{"

      {REDACTED}":["[0, 0]"],"{REDACTED}

      ":["[MinKey, MaxKey]"],"
      {REDACTED}

      ":["[new Date(-9223372036854775808), new Date(1667812104306))"]},"keysExamined":1,"seeks":1,"dupsTested":0,"dupsDropped":0},{"stage":"IXSCAN","nReturned":0,"executionTimeMillisEstimate":1,"works":11,"advanced":0,"needTime":10,"needYield":0,"saveState":2,"restoreState":1,"isEOF":1,"keyPattern":{"

      {REDACTED}

      ":1,"
      {REDACTED}

      ":1,"{REDACTED}

      ":1},"indexName":"

      {REDACTED}","isMultiKey":false,"multiKeyPaths":{"{REDACTED}

      ":[],"

      {REDACTED}":[],"{REDACTED}

      ":[]},"isUnique":false,"isSparse":false,"isPartial":false,"indexVersion":2,"direction":"forward","indexBounds":{"
      {REDACTED}

      ":["[1, 1]"],"{REDACTED}

      ":["[MinKey, MaxKey]"],"

      {REDACTED}":["[new Date(-9223372036854775808), new Date(1667812104306))"]},"keysExamined":11,"seeks":11,"dupsTested":0,"dupsDropped":0}]}}}},"cmd":{"findAndModify":"{REDACTED}

      ","query":{"

      {REDACTED}":{"$in":[0,1]},"{REDACTED}

      ":{"$lt":
      {"$date":"2022-11-07T09:08:24.306Z"}

      }},"sort":{"
      {REDACTED}

      ":1,"{REDACTED}

      ":1},"update":{"$set":{"

      {REDACTED}":1,"{REDACTED}

      ":
      {"$date":"2022-11-07T09:19:54.305Z"}

      ,"
      {REDACTED}

      ":{"$binary":{"base64":"hW7CcBc5Uki6JVK/hvU5ew==","subType":"3"}},"{REDACTED}

      ":{"$date":"2022-11-07T09:08:24.305Z"}},"$unset":{"

      {REDACTED}":1},"$inc":{"{REDACTED}

      ":1}},"runtimeConstants":{"localNow":

      {"$date":"2022-11-07T09:08:24.305Z"}

      ,"clusterTime":{"$timestamp":

      {"t":1667812104,"i":2118}

      }},"writeConcern":{"w":1,"wtimeout":0,"provenance":"implicitDefault"}}}}

       

      We're urgently need help, because this is a show stopper for our MongoDB cluster optimization.

      Please let me know if any additional information is required.

        1. diagnostics.data.zip
          25.95 MB
        2. Screen Shot 2022-11-10 at 4.46.12 PM.png
          Screen Shot 2022-11-10 at 4.46.12 PM.png
          270 kB

            Assignee:
            yuan.fang@mongodb.com Yuan Fang
            Reporter:
            vladimirred456@gmail.com Vladimir Beliakov
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: