Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-3236

High CPU wait load on capped collection reaching it's limit

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Blocker - P1 Blocker - P1
    • None
    • Affects Version/s: 1.8.1
    • Component/s: Performance
    • Labels:
    • Environment:
      Linux ip-10-164-19-105 2.6.21.7-2.fc8xen #1 SMP Fri Feb 15 12:34:28 EST 2008 x86_64 x86_64 x86_64 GNU/Linux

      EC2 m2.2xlarge instance
    • Linux

      I have very strange behavior of mongo on out Nimble server. We use mongodb to run cache for our product. Cache consists of single capped collection with few indexed fields. Here's the stats for it:

      db> db.cache.stats()
      {
      "ns" : "nimble_cache.cache",
      "count" : 6795240,
      "size" : 29891215048,
      "avgObjSize" : 4398.8461111013,
      "storageSize" : 30000000304,
      "numExtents" : 15,
      "nindexes" : 3,
      "lastExtentSize" : 256,
      "paddingFactor" : 1,
      "flags" : 0,
      "totalIndexSize" : 3472837440,
      "indexSizes" :

      { "cache_id_-1_created_-1" : 561824704, "depends_on_1_deadline_-1" : 2351080384, "cache_id_1_resource_1_created_-1" : 559932352 }

      ,
      "capped" : 1,
      "max" : 2147483647,
      "ok" : 1

      }

      It runs on m2.2xlarge EC2 instance that have 34 GBs of memory. Here's
      memory stats of the mongo instance:

      > db.serverStatus().mem

      { "bits" : 64, "resident" : 6300, "virtual" : 34749, "supported" : true, "mapped" : 32960 }

      The load on server isn't really high (lock percentage is terrible however):
      [root@ip-10-164-1-195 ~]# /opt/mongo/bin/mongostat
      connected to: 127.0.0.1
      insert query update delete getmore command flushes mapped vsize res faults locked % idx miss % qr|qw ar|aw netIn netOut conn time
      93 368 89 0 0 86 0 32.2g 33.9g 6.2g 7 39.8 0 5|7 5|8 444k 684k 151 22:56:44
      66 302 85 0 0 90 0 32.2g 33.9g 6.2g 3 77.4 0 0|0 0|0 325k 504k 151 22:56:45
      47 153 43 0 0 44 0 32.2g 33.9g 6.2g 4 77.1 0 0|0 0|0 327k 386k 151 22:56:46
      67 256 70 0 0 66 0 32.2g 33.9g 6.2g 4 35.9 0 6|8 6|9 329k 437k 151 22:56:47
      103 384 104 0 0 106 0 32.2g 33.9g 6.2g 5 60.9 0 6|6 6|7 449k 755k 151 22:56:48
      56 207 71 0 0 75 0 32.2g 33.9g 6.2g 5 53.3 0 5|6 5|7 390k 623k 151 22:56:49
      47 201 40 0 0 39 0 32.2g 33.9g 6.2g 4 78.1 0 4|4 4|5 348k 473k 151 22:56:50
      99 381 117 0 0 121 0 32.2g 33.9g 6.2g 5 27.6 0 0|0 0|0 445k 766k 151 22:56:51
      78 308 89 0 0 90 0 32.2g 33.9g 6.21g 5 39.3 0 0|0 0|0 461k 657k 151 22:56:52
      115 374 122 0 0 120 0 32.2g 33.9g 6.21g 8 4.7 0 3|7 3|8 484k 647k 151 22:56:54

      All updates are on non-indexed fields. All queries are using indexes.

      The problem I have is very high disk IO activity. That causes that locks level above. Here's the output of iostat:

      [root@ip-10-164-1-195 ~]# iostat -xdm /dev/sdk 2
      Linux 2.6.21.7-2.fc8xen (ip-10-164-1-195) 06/07/11

      Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
      sdk 0.05 259.76 83.07 407.36 2.52 5.83 34.86 43.29 88.28 1.63 79.93
      sdk 0.00 0.00 6.00 437.00 0.08 1.74 8.42 105.36 260.29 2.26 99.90
      sdk 0.00 0.00 6.00 741.00 0.06 3.03 8.47 39.87 67.59 1.32 98.95
      sdk 0.00 4.00 25.00 2519.00 0.17 10.07 8.24 80.18 31.24 0.39 98.30
      sdk 0.00 25.00 14.00 1530.00 0.14 6.33 8.59 23.92 15.76 0.64 98.10
      sdk 0.00 2.50 6.50 422.50 0.07 1.73 8.57 55.19 96.98 2.32 99.60
      sdk 0.00 0.50 11.50 135.00 0.11 0.54 9.17 44.41 363.53 6.82 99.95
      sdk 0.00 0.00 8.00 104.00 0.08 0.41 8.89 108.89 617.02 8.93 100.05
      sdk 0.00 0.00 14.50 270.50 0.13 1.08 8.65 69.87 390.25 3.51 100.00
      sdk 0.00 0.00 6.00 154.00 0.05 0.61 8.45 51.68 305.67 6.25 100.05
      sdk 0.00 0.00 9.50 204.50 0.07 0.86 8.93 24.03 139.49 4.67 99.90

      I have trying disabling journaling or increasing syncdelay parameter (now it's set to 300 seconds). No positive results so far. The one thing I mentioned is that after mongo restart everything works pretty good for about ~10-15 minutes. However after that period of time disk IO starts raising and in few minutes device utilization reaches 100%.

      I'm really out of ideas of how to deal with it.

            Assignee:
            scotthernandez Scott Hernandez (Inactive)
            Reporter:
            rmihael@gmail.com Michael Korbakov
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved: