[SERVER-3236] High CPU wait load on capped collection reaching it's limit Created: 09/Jun/11 Updated: 12/Jul/16 Resolved: 02/Sep/11 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Performance |
| Affects Version/s: | 1.8.1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Blocker - P1 |
| Reporter: | Michael Korbakov | Assignee: | Scott Hernandez (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | performance | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Linux ip-10-164-19-105 2.6.21.7-2.fc8xen #1 SMP Fri Feb 15 12:34:28 EST 2008 x86_64 x86_64 x86_64 GNU/Linux EC2 m2.2xlarge instance |
||
| Operating System: | Linux |
| Participants: |
| Description |
|
I have very strange behavior of mongo on out Nimble server. We use mongodb to run cache for our product. Cache consists of single capped collection with few indexed fields. Here's the stats for it: db> db.cache.stats() , } It runs on m2.2xlarge EC2 instance that have 34 GBs of memory. Here's > db.serverStatus().mem { "bits" : 64, "resident" : 6300, "virtual" : 34749, "supported" : true, "mapped" : 32960 }The load on server isn't really high (lock percentage is terrible however): All updates are on non-indexed fields. All queries are using indexes. The problem I have is very high disk IO activity. That causes that locks level above. Here's the output of iostat: [root@ip-10-164-1-195 ~]# iostat -xdm /dev/sdk 2 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util I have trying disabling journaling or increasing syncdelay parameter (now it's set to 300 seconds). No positive results so far. The one thing I mentioned is that after mongo restart everything works pretty good for about ~10-15 minutes. However after that period of time disk IO starts raising and in few minutes device utilization reaches 100%. I'm really out of ideas of how to deal with it. |
| Comments |
| Comment by Eliot Horowitz (Inactive) [ 02/Sep/11 ] |
|
Thanks |
| Comment by Michael Korbakov [ 02/Sep/11 ] |
|
The problem turned out to be in EBS snapshots that we're using for backups. This cache collection was under heavy updates and snapshotting is turning some heavy IO load on it. We solved the problem by reducing the frequency of snapshots. |
| Comment by Eliot Horowitz (Inactive) [ 30/Aug/11 ] |
|
Any update from your side? |
| Comment by Michael Korbakov [ 10/Jun/11 ] |
|
No there's no any other collections on this database. We already tried profiling. There's three different queries run on this collection: insert of new cache record, invalidation of cache record and fetching cache record. Query of each types were reported as "slow" by profiling at different times. One thing that I've forgot to mention in the initial problem description: we're running backups using EBS snapshots. We don't stop Mongo server and do this snapshot "on the fly", relying on journaling for recovery. May be these backups contribute something to the problem. Unfortunately I can't get actual mongostat/iostat numbers now – we're experimenting by ourselves trying to solve this issue and current configuration isn't the one described above. We'll get back to original configuration with capped collection after weekends, I'll attach this stats then. |
| Comment by Scott Hernandez (Inactive) [ 09/Jun/11 ] |
|
Aside from the capped collection are there other collections which are on the same servers? What updates are you doing? Can you turn on database profiling and report on the some of the slowest operations which you see? http://www.mongodb.org/display/DOCS/Database+Profiler Also, can you provide a few minutes of mongostat/iostat numbers (as an attachment)? |