[SERVER-4353] Getting faults and high IO % util but still have 30GB or RAM free Created: 22/Nov/11  Updated: 29/May/12  Resolved: 26/Jan/12

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: Ben Wyrosdick Assignee: Daniel Pasette (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu 11.04


Participants:

 Description   

We have a customer that is seeing slow response times on queries over a 75GB collection. They are on a 68GB EC2 box and it seems to not be using all the RAM but it is crushing IO and faulting according to mongostat.

== iostat ==
avg-cpu: %user %nice %system %iowait %steal %idle
0.03 0.00 0.08 19.70 0.00 80.19

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
xvdap1 0.00 0.00 0.00 9.00 0.00 36.00 8.00 0.10 11.11 0.00 11.11 1.67 1.50
xvdb 0.00 0.00 0.00 5.50 0.00 22.00 8.00 0.04 7.27 0.00 7.27 1.82 1.00
xvdfp1 1.00 0.00 316.50 0.00 5990.00 0.00 37.85 51.85 156.45 156.45 0.00 2.91 92.00
xvdfp2 1.50 0.00 341.50 3.50 6784.00 132.00 40.09 74.00 214.16 213.95 234.29 2.75 95.00
xvdfp3 1.50 0.00 291.00 0.00 5758.00 0.00 39.57 45.48 135.58 135.58 0.00 3.06 89.00
xvdfp4 2.00 0.00 353.50 0.00 6930.00 0.00 39.21 29.96 90.10 90.10 0.00 2.18 77.00
md0 0.00 0.00 1257.50 6.00 24208.00 128.00 38.52 0.00 0.00 0.00 0.00 0.00 0.00
dm-0 0.00 0.00 1257.50 3.00 24208.00 128.00 38.61 202.56 155.54 155.21 291.67 0.79 100.00
xvdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

== mongostat ==
insert query update delete getmore command flushes mapped vsize res faults locked % idx miss % qr|qw ar|aw netIn netOut conn set repl time
0 253 99 0 2 29 0 164g 165g 34.9g 11 10.2 0 0|1 1|1 135k 2m 249 xxxx M 19:52:12
0 258 75 0 2 31 0 164g 165g 34.9g 19 17 0 0|0 0|0 120k 718k 249 xxxx M 19:52:13
0 191 89 0 4 26 0 164g 165g 34.9g 17 19.2 0 0|0 0|0 134k 828k 249 xxxx M 19:52:14
0 241 105 0 5 28 0 164g 165g 34.9g 11 1.7 0 0|0 0|0 157k 1m 248 xxxx M 19:52:15
0 172 86 0 5 20 0 164g 165g 34.9g 19 32.1 5.5 6|3 7|3 118k 950k 249 xxxx M 19:52:16
0 224 128 0 1 17 0 164g 165g 34.9g 22 38.1 0 0|0 0|0 178k 3m 249 xxxx M 19:52:17
0 221 151 0 2 23 0 164g 165g 34.9g 13 11.2 0 6|4 6|5 204k 751k 249 xxxx M 19:52:18
0 162 76 0 2 54 0 164g 165g 34.9g 23 41.6 0 5|3 6|3 119k 599k 249 xxxx M 19:52:19
0 141 64 0 5 17 0 164g 165g 34.9g 24 40 0 11|1 11|2 96k 1m 249 xxxx M 19:52:20
0 90 38 0 2 8 0 164g 165g 34.9g 22 79.8 0 5|4 7|4 49k 1m 249 xxxx M 19:52:21

== free ==
total used free shared buffers cached
Mem: 68689 68353 336 0 26 65791
-/+ buffers/cache: 2535 66153
Swap: 20473 7 20466

hooking up MMS now ... I will write back once it is setup.



 Comments   
Comment by Daniel Pasette (Inactive) [ 24/Dec/11 ]

did you end up setting up MMS? Under which account name?

Comment by Scott Hernandez (Inactive) [ 22/Nov/11 ]

You don't have free memory: all of it is in use for cache. Your data is larger than memory and you seem to be accessing data not in memory, hence the faults.

 
       total used free shared buffers cached
Mem:   68689 68353 336 0 26 65791

MMS will help a bit to analyze things.

Comment by Ben Wyrosdick [ 22/Nov/11 ]

So I didn't finish my thought ... is there anything we need to do/change to make it consume the rest of the memory since it seems like it would help to have more RAM since it is faulting.

Comment by Ben Wyrosdick [ 22/Nov/11 ]

the paste failed so here are screenshots.

http://cl.ly/2n451A1l122h0Z3c3O2R
http://cl.ly/1H100z022v3A180f1V0Y

Generated at Thu Feb 08 03:05:45 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.