[SERVER-3468] Very high cpu usage Created: 22/Jul/11  Updated: 12/Jul/16  Resolved: 27/Jul/11

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Braden Evans Assignee: Mathias Stearn
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Hardware:
2x Quad Core 2.4ghz Xeon (Hyperthreaded)
24gb ram in 2 banks (this is a NUMA system)
SSD storage

We have tried the following OSes:
Windows 2008 R2
Ubuntu server 11.4 on Hyper-V
Ubuntu server 11.4

mongod --journal --replSet


Operating System: ALL
Participants:

 Description   
  • After some time we start seeing mongostat ticks with very high (sometimes >100) lock %
  • Throughput falls, cpu skyrockets and io stays mostly idle
  • Machine becomes very unresponsive
  • The database never totally stops processing requests but is extremely slow even to call serverStatus()
  • Calling db.runCommand( {closeAllDatabases:1}

    ); resolves the problem

We have not had much luck isolating the cause of this behaviour but it does appear to be heavy reads that cause it, we have tested totally saturating with writes without this happening

The timeline that most recently triggered this problem is:

  • Two slaves doing their initial sync
  • Moderate read/write load <1mbs

We have tried running with numactrl interleave=all without any change (running with it on in this example)

#################################################################
During "incident":

Mongostat:
insert query update delete getmore command flushes mapped vsize res faults locked % idx miss % qr|qw ar|aw netIn netOut conn set repl time
0 0 0 0 0 6 0 191g 382g 11.6g 0 24 0 2|1 1|2 632b 3k 28 test M 15:47:53
18 4 2 0 0 10 0 191g 382g 11.6g 1 0 0 2|3 1|3 4k 4k 28 test M 15:47:57
1 8 2 0 3 8 1 191g 382g 11.6g 1 231 0 5|0 4|1 1k 30k 28 test M 15:48:01
23 1 2 0 2 8 0 191g 382g 11.6g 10 99.5 0 1|0 1|1 3k 4k 28 test M 15:48:05
1 0 1 0 0 5 0 191g 382g 11.6g 0 0 0 2|3 1|3 1k 3k 28 test M 15:48:08
0 1 0 0 0 9 0 191g 382g 11.6g 0 241 0 3|1 2|2 1k 4k 28 test M 15:48:14
1 1 0 0 0 7 0 191g 382g 11.6g 0 0 0 3|2 2|2 931b 3k 28 test M 15:48:17
1 3 2 0 2 6 0 191g 382g 11.6g 2 264 0 1|0 1|1 1k 28k 28 test M 15:48:21
26 1 1 0 11 5 0 191g 382g 11.6g 65 75.6 0 0|0 1|0 4k 2k 28 test M 15:48:24
0 9 3 0 7 19 0 191g 382g 11.7g 40 22.2 0 0|0 1|0 3k 9k 28 test M 15:48:39
0 0 0 0 0 8 0 191g 382g 11.7g 0 0 0 3|3 2|3 759b 3k 28 test M 15:48:44
3 2 1 0 0 5 0 191g 382g 11.7g 27 174 0 4|0 4|0 2k 2k 28 test M 15:48:47
0 1 1 0 6 4 0 191g 382g 11.7g 546 59.7 0 0|0 1|0 963b 391k 28 test M 15:48:48
0 0 0 0 7 3 0 191g 382g 11.8g 425 0 0 0|0 2|0 304b 4m 28 test M 15:48:49
0 1 1 0 4 4 0 191g 382g 11.8g 250 0 0 0|0 1|0 489b 3k 28 test M 15:48:50
3 0 1 0 0 3 0 191g 382g 11.8g 372 0 0 0|0 1|0 1k 1k 28 test M 15:48:51
0 1 2 0 1 7 0 191g 382g 11.8g 121 0 0 0|1 1|1 821b 3k 28 test M 15:48:56
6 5 2 0 2 15 1 191g 382g 11.8g 78 0 0 0|0 1|0 2k 30k 28 test M 15:49:06
11 4 1 0 2 6 0 191g 382g 11.8g 42 3.7 0 0|1 1|1 2k 28k 28 test M 15:49:07

iostat

avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 7.66 0.00 0.00 92.34

Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 5.50 102.00 0.00 204 0
sdb 0.00 0.00 0.00 0 0
sdc 0.00 0.00 0.00 0 0
sdd 4.00 0.00 2048.00 0 4096
sde 0.00 0.00 0.00 0 0
sdf 1.00 0.00 8.00 0 16
dm-0 0.00 0.00 0.00 0 0

vmstat

procs ----------memory--------- --swap- ----io--- system- ---cpu---
r b swpd free buff cache si so bi bo in cs us sy id wa
1 0 88 77448 21336 23876708 0 0 768 189 120 87 0 6 94 1
3 0 88 76748 21352 23876740 0 0 162 1090 672 833 0 14 86 0
1 0 88 76696 21352 23876628 0 0 0 2164 634 738 0 5 95 0
1 0 88 76952 21360 23876860 0 0 0 2084 666 834 0 9 91 0
1 0 88 77200 21360 23876860 0 0 0 1034 591 725 0 10 90 0
1 0 88 77448 21372 23876852 0 0 0 2062 669 826 0 10 90 0
1 0 88 77572 21384 23876848 0 0 64 2164 718 920 0 5 95 0
1 0 88 77572 21384 23876852 0 0 0 1028 550 655 0 7 93 0
1 0 88 78044 21392 23876988 0 0 0 2056 568 686 0 6 94 0
1 0 88 77392 21392 23876908 0 0 236 1038 719 882 0 10 89 0
2 0 88 77408 21392 23876904 0 0 0 2048 600 706 0 8 92 0
2 0 88 77536 21392 23876908 0 0 0 2054 606 703 0 6 94 0
1 0 88 77872 21408 23876912 0 0 0 1050 657 749 0 12 87 0

Log file shows lots of
serverStatus was very slow:

{ after basic: 0, middle of mem: 1990, after mem: 1990, after connections: 1990, after extra info: 2550, after counters: 2550, after repl: 2550, after asserts: 2550 }

#################################################################
After calling close files:

insert query update delete getmore command flushes mapped vsize res faults locked % idx miss % qr|qw ar|aw netIn netOut conn set repl time
0 152 1 0 0 5 0 191g 382g 108m 0 0 0 0|0 0|0 19k 399k 21 test M 15:51:59
0 152 1 0 0 7 0 191g 382g 108m 0 0 0 0|0 0|0 19k 393k 21 test M 15:52:00
0 294 0 0 0 6 0 191g 382g 108m 0 0 0 0|0 0|0 37k 788k 21 test M 15:52:01
29 172 0 0 2 7 0 191g 382g 108m 0 0 0 0|0 0|0 30k 516k 21 test M 15:52:02
11 180 0 0 0 6 0 191g 382g 108m 1 0.5 0 0|0 0|0 23k 431k 21 test M 15:52:03
0 193 1 0 0 9 0 191g 382g 108m 0 0 0 0|0 0|0 29k 515k 21 test M 15:52:04
0 192 1 0 0 7 0 191g 382g 108m 0 0 0 0|0 0|0 29k 519k 21 test M 15:52:05
0 260 0 0 0 8 0 191g 382g 108m 0 0 0 0|0 0|0 34k 743k 21 test M 15:52:06
0 221 0 0 0 6 0 191g 382g 108m 0 0 0 0|0 0|0 31k 537k 21 test M 15:52:07
0 179 1 0 0 6 0 191g 382g 108m 0 0 0 0|0 0|0 28k 508k 21 test M 15:52:08

cpu returns to mostly idle with this load



 Comments   
Comment by Eliot Horowitz (Inactive) [ 27/Jul/11 ]

Got it.
See: SERVER-3497

Comment by Braden Evans [ 27/Jul/11 ]

Setting zone_reclaim_mode should be unnecessary if kernel support for NUMA is fully disabled, basically we just made sure it was off. The kernel will automatically set it to 1 if it detects a NUMA machine where the cost to communicate with a non-local node is under a specific threshold.

Comment by Eliot Horowitz (Inactive) [ 27/Jul/11 ]

Interesting - did you test each independently?

Comment by Braden Evans [ 27/Jul/11 ]

We might have solved the problem, the server has been up for over 12 hours and performance is still nominal, all CPUs are enabled and so is HyperThreading.

Solution: We disabled kernel NUMA support at boot time, effectively forcing memory allocation to be interleaved between nodes for all processes and set /proc/sys/vm/zone_reclaim_mode to zero.

Comment by Braden Evans [ 26/Jul/11 ]

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 44
model name : Intel(R) Xeon(R) CPU E5620 @ 2.40GHz
stepping : 2
cpu MHz : 2401.000
cache size : 12288 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc
aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm ida arat epb dts tpr_shadow vnmi flexpriority ept vpid
bogomips : 4800.39
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:

+7 more identical cores

We are going to pull a processor out of the machine and test again

Comment by Eliot Horowitz (Inactive) [ 26/Jul/11 ]

Its probably flushing the kernel mapping structures, which is why i think its a numa type issue.

Can you send /proc/cpuinfo

Comment by Braden Evans [ 26/Jul/11 ]

I think that the best clue we have right now is that calling db.runCommand(

{closeAllDatabases:1}

) immediately restores performance to normal levels. Any ideas?

Comment by Braden Evans [ 26/Jul/11 ]

Yes, we have tried it with on and off, no difference that I could observe.

Comment by Eliot Horowitz (Inactive) [ 26/Jul/11 ]

You tried interleaved with hyper threading off?

Comment by Braden Evans [ 26/Jul/11 ]

Hi Eliot, it is already turned off, we have 8 hardware cores, 16 logical cores with HT enabled.

Comment by Eliot Horowitz (Inactive) [ 26/Jul/11 ]

Can you try turning off hyper threading?

I just saw a very similar situation and turning off hyper threading fixed it.

Comment by Braden Evans [ 26/Jul/11 ]

Summary of the situation so far:

0. Mongod is running with numactl --interleave=all and hyper-threading is disabled
1. Machine is idle, we start a mongodump, no other operations are running in the background
2. Dump runs at normal speed for 10 to 15 minutes, disk and CPU usage are high but performance is excellent
3. After 10 to 15 minutes, most disk activity ceases, CPU usage remain high, mongodump process slows down to a crawl - this is the "degraded" state
4. At this point, we can stop the dump and CPU activity will remain high, disk activity will remains minimal.
5. While in this degraded state, all database operation are very slow, **even hours after mongodump was stopped**. Even calls to serverStatus() are incredibly slow.
6. Mongod will remain in this degraded state for several hours or until we run db.runCommand(

{closeAllDatabases:1}

);
7. db.runCommand(

{closeAllDatabases:1}

) returns in 1-2 seconds, performance is immediately restored to normal levels
8, If we run mongodb without numactl, the database will perform flawlessly for hours (instead of 15 minutes) but will eventually degrade
9. In the degraded state, mongodb is locking up a CPU core, iostat reveals very little disk activity, vmstat reveals that the system is not waiting for IO or in a distressed state

Comment by Braden Evans [ 26/Jul/11 ]

Hi Mathias, here is the output fron numactl --hardware:

available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3
node 0 size: 12279 MB
node 0 free: 64 MB
node 1 cpus: 4 5 6 7
node 1 size: 12288 MB
node 1 free: 12 MB
node distances:
node 0 1
0: 10 21
1: 21 10

Unfortunately, I found mongod to be in the degraded state early this morning after performing flawlessly for hours.

Right now the database is up, albeit running very slowly, locking up a core at 100%, the machine is otherwise idle. Iostat reveals very little disk activity, vmstat reveals that the machine is not waiting for IO or in a distressed state.

Interestingly, if db.runCommand(

{closeAllDatabases:1}

the database performance will be immediately restored to normal levels.

Comment by Mathias Stearn [ 26/Jul/11 ]

Could you send the output of numactl --hardware both while seeing the issue and with it solved?

Comment by Eliot Horowitz (Inactive) [ 26/Jul/11 ]

Mathias - we should try and figure out why numactl was hurting in this case.

Comment by Braden Evans [ 25/Jul/11 ]

Hi Eliot, we have identified the issue. The problems is caused by running mongod with "numactl --interleave=all" as per instructed by this message logged during the server startup:

    • WARNING: You are running on a NUMA machine.
    • We suggest launching mongod like this to avoid performance problems:
    • numactl --interleave=all mongod [other options]

As it turns out, this recommendation causes performance problems instead of solving them. Running mongod without numactl solves all the issues we were experiencing.

Thank you for your help Eliot, we really appreciate it!

Comment by Braden Evans [ 25/Jul/11 ]

Hi Eliot,

As per your suggestion, I just ran the 1.9 nightly mongodump --forceTableScan without any other command line options. Same as before, we experienced performance degradation 10-15 minutes after the dump started. According to vmstat (below), the system is not under memory pressure or waiting for IO. Mongostat (also below) shows that that the database is still serving requests, albeit at a much reduced pace.

procs ----------memory--------- --swap- ----io--- system- ---cpu---
r b swpd free buff cache si so bi bo in cs us sy id wa
1 0 2684 81620 31768 23707820 0 0 48 0 1075 1284 0 13 87 0
1 0 2684 81620 31768 23707908 0 0 0 0 584 790 0 14 86 0
1 0 2684 81620 31768 23707888 0 0 28 0 603 845 0 12 88 0
1 0 2684 81496 31768 23707916 0 0 0 0 613 878 0 12 88 0
1 0 2684 81372 31768 23707916 0 0 0 0 543 776 0 12 88 0
1 0 2684 81372 31768 23707916 0 0 0 0 1037 1191 0 12 87 0
1 0 2684 81372 31776 23707912 0 0 0 12 675 917 0 14 81 6
1 0 2684 81496 31776 23707916 0 0 0 0 649 879 0 12 88 0
2 0 2684 81496 31776 23707916 0 0 0 0 569 756 0 13 87 0
1 0 2684 81496 31784 23707916 0 0 0 32 579 804 0 13 87 0
1 0 2684 81620 31784 23707916 0 0 0 0 520 803 0 14 86 0
1 0 2684 81620 31784 23707916 0 0 0 0 535 747 0 12 88 0
1 0 2684 81620 31784 23707916 0 0 0 0 572 780 0 13 87 0
1 0 2684 78124 31784 23708784 0 0 0 0 1547 1775 0 13 87 0
1 0 2684 78124 31792 23708784 0 0 0 16 1603 1803 0 13 87 0
1 0 2684 78248 31792 23708784 0 0 0 0 1586 1767 0 13 87 0
1 0 2684 78124 31792 23708784 0 0 0 0 1576 1853 0 13 87 0
1 0 2684 77876 31816 23708760 0 0 0 48 1582 1773 0 13 86 1
1 0 2684 77876 31816 23708784 0 0 0 0 1601 1756 0 13 87 0
1 0 2684 78000 31816 23708396 0 0 0 0 1681 1836 0 12 87 0
1 0 2684 78000 31816 23708324 0 0 0 4 1723 1835 0 13 87 0
1 0 2684 88664 31816 23708324 0 0 0 16 1722 1927 0 13 87 0
1 0 2684 88424 31816 23708300 0 0 0 0 1718 1867 0 13 87 0

insert query update delete getmore command flushes mapped vsize res faults locked % idx miss % qr|qw ar|aw netIn netOut conn set repl time
0 0 0 0 0 4 0 191g 382g 18.6g 0 0 0 0|0 1|0 375b 6k 7 wireclub M 14:12:53
0 0 0 0 1 3 0 191g 382g 18.6g 0 0 0 0|0 1|0 171b 3k 7 wireclub M 14:12:54
0 0 0 0 0 4 0 191g 382g 18.6g 0 0 0 0|0 1|0 313b 4k 7 wireclub M 14:12:55
0 0 0 0 1 3 0 191g 382g 18.6g 0 0 0 0|0 1|0 233b 4k 7 wireclub M 14:12:56
0 0 0 0 0 4 0 191g 382g 18.6g 0 0 0 0|0 1|0 313b 4k 7 wireclub M 14:12:57
0 0 0 0 1 3 0 191g 382g 18.6g 0 0 0 0|0 1|0 233b 4k 7 wireclub M 14:12:58
0 0 0 0 0 4 0 191g 382g 18.6g 0 0 0 0|0 1|0 313b 4k 7 wireclub M 14:12:59
0 0 0 0 1 3 0 191g 382g 18.6g 0 0 0 0|0 1|0 233b 4k 7 wireclub M 14:13:00
0 0 0 0 0 4 0 191g 382g 18.6g 0 0 0 0|0 1|0 313b 4k 7 wireclub M 14:13:01
0 0 0 0 1 3 0 191g 382g 18.6g 0 0 0 0|0 1|0 233b 4k 7 wireclub M 14:13:02
insert query update delete getmore command flushes mapped vsize res faults locked % idx miss % qr|qw ar|aw netIn netOut conn set repl time
0 0 0 0 0 4 0 191g 382g 18.6g 0 0 0 0|0 1|0 313b 4k 7 wireclub M 14:13:03
0 0 0 0 1 3 0 191g 382g 18.6g 0 0 0 0|0 1|0 233b 4k 7 wireclub M 14:13:04
0 0 0 0 0 4 0 191g 382g 18.6g 0 0 0 0|0 1|0 375b 6k 7 wireclub M 14:13:05
0 0 0 0 0 3 0 191g 382g 18.6g 0 0 0 0|0 1|0 124b 3k 7 wireclub M 14:13:06
0 0 0 0 1 4 0 191g 382g 18.6g 0 0 0 0|0 1|0 360b 4k 7 wireclub M 14:13:07
0 0 0 0 0 3 0 191g 382g 18.6g 0 0 0 0|0 1|0 186b 4k 7 wireclub M 14:13:08
0 0 0 0 1 4 0 191g 382g 18.6g 0 0 0 0|0 1|0 360b 4k 7 wireclub M 14:13:09
0 0 0 0 0 2 0 191g 382g 18.6g 0 0 0 0|0 1|0 124b 3k 7 wireclub M 14:13:10
0 0 0 0 1 4 0 191g 382g 18.6g 0 0 0 0|0 1|0 360b 4k 7 wireclub M 14:13:11
0 0 0 0 0 3 0 191g 382g 18.6g 0 0 0 0|0 1|0 186b 4k 7 wireclub M 14:13:12
insert query update delete getmore command flushes mapped vsize res faults locked % idx miss % qr|qw ar|aw netIn netOut conn set repl time
0 0 0 0 1 4 0 191g 382g 18.6g 0 0 0 0|0 1|0 360b 4k 7 wireclub M 14:13:13
0 0 0 0 0 3 0 191g 382g 18.6g 0 0 0 0|0 1|0 186b 4k 7 wireclub M 14:13:14
0 0 0 0 1 4 0 191g 382g 18.6g 0 0 0 0|0 1|0 422b 6k 7 wireclub M 14:13:15
0 0 0 0 0 3 0 191g 382g 18.6g 0 0 0 0|0 1|0 124b 3k 7 wireclub M 14:13:16
0 0 0 0 2 4 0 191g 382g 18.6g 0 0 0 0|0 2|0 360b 4k 7 wireclub M 14:13:17
0 0 0 0 5 3 0 191g 382g 18.6g 71 0 0 0|0 1|0 186b 4k 7 wireclub M 14:13:18
0 0 0 0 1 4 0 191g 382g 18.6g 0 0 0 0|0 1|0 360b 4k 7 wireclub M 14:13:19
0 0 0 0 0 3 0 191g 382g 18.6g 0 0 0 0|0 1|0 186b 4k 7 wireclub M 14:13:20
0 0 0 0 1 4 0 191g 382g 18.6g 0 0 0 0|0 1|0 360b 4k 7 wireclub M 14:13:21
0 0 0 0 0 3 0 191g 382g 18.6g 0 0 0 0|0 1|0 186b 4k 7 wireclub M 14:13:22
insert query update delete getmore command flushes mapped vsize res faults locked % idx miss % qr|qw ar|aw netIn netOut conn set repl time
0 0 0 0 1 4 0 191g 382g 18.6g 0 0 0 0|0 1|0 360b 4k 7 wireclub M 14:13:23
0 0 0 0 0 3 0 191g 382g 18.6g 0 0 0 0|0 1|0 186b 4k 7 wireclub M 14:13:24
0 0 0 0 0 4 0 191g 382g 18.6g 0 0 0 0|0 1|0 313b 4k 7 wireclub M 14:13:25
0 0 0 0 1 3 0 191g 382g 18.6g 0 0 0 0|0 1|0 233b 4k 7 wireclub M 14:13:26
0 0 0 0 0 4 0 191g 382g 18.6g 0 0 0 0|0 1|0 313b 4k 7 wireclub M 14:13:27
0 0 0 0 1 3 0 191g 382g 18.6g 0 0 0 0|0 1|0 233b 4k 7 wireclub M 14:13:28
0 0 0 0 0 4 0 191g 382g 18.6g 0 0 0 0|0 1|0 313b 4k 7 wireclub M 14:13:29
0 0 0 0 1 3 0 191g 382g 18.6g 0 0 0 0|0 1|0 233b 4k 7 wireclub M 14:13:30
0 0 0 0 0 4 0 191g 382g 18.6g 0 0 0 0|0 1|0 313b 4k 7 wireclub M 14:13:31
0 0 0 0 1 3 0 191g 382g 18.6g 0 0 0 0|0 1|0 233b 4k 7 wireclub M 14:13:32
insert query update delete getmore command flushes mapped vsize res faults locked % idx miss % qr|qw ar|aw netIn netOut conn set repl time
0 0 0 0 0 4 0 191g 382g 18.6g 0 0 0 0|0 1|0 313b 4k 7 wireclub M 14:13:33
0 0 0 0 2 5 0 191g 382g 18.6g 2 0 0 0|0 2|0 425b 3k 7 wireclub M 14:13:38
0 0 0 0 1 8 0 191g 382g 18.6g 0 0 0 0|0 2|0 930b 7k 7 wireclub M 14:13:46
0 0 0 0 5 2 1 191g 382g 18.6g 86 0 0 0|0 1|0 186b 4k 7 wireclub M 14:13:47
0 0 0 0 0 4 0 191g 382g 18.6g 0 0 0 0|0 1|0 313b 4k 7 wireclub M 14:13:48
0 0 0 0 1 3 0 191g 382g 18.6g 0 0 0 0|0 1|0 233b 4k 7 wireclub M 14:13:49
0 0 0 0 0 4 0 191g 382g 18.6g 0 0 0 0|0 1|0 313b 4k 7 wireclub M 14:13:51
0 0 0 0 1 3 0 191g 382g 18.6g 0 0 0 0|0 1|0 233b 4k 7 wireclub M 14:13:52
0 0 0 0 0 4 0 191g 382g 18.6g 0 0 0 0|0 1|0 313b 4k 7 wireclub M 14:13:53

Comment by Eliot Horowitz (Inactive) [ 25/Jul/11 ]

Can you try a mongodump from the 1.9 nightly (using whatever server version you have) and use --forceTableScan?

If that behaves differently will tell us a lot.

Also, what command line options are you passing if any?

Comment by Braden Evans [ 25/Jul/11 ]

Hi Eliot, it seems to be transferring albeit very slowly.

Here is a play-by-play of what we are observing:

1. Machine is idle, we start a mongodump, no other operations are running in the background
2. Dump runs at normal speed for 10 to 15 minutes, disk and CPU usage are high but performance is excellent
3. After 10 to 15 minutes, most disk activity ceases, CPU usage remain high, mongodump process slows down to a crawl - this is the "degraded" state
4. At this point, we can stop the dump and CPU activity will remain high, disk activity will remains minimal.
5. While in this degraded state, all database operation are very slow, even hours after mongodump is stopped. Even calls to serverStatus() are incredibly slow.
6. Mongod will remain in this degraded state for several hours or until we run db.runCommand(

{closeAllDatabases:1}

);
7. db.runCommand(

{closeAllDatabases:1}

) returns in 1-2 seconds, performance is immediately restored to normal levels

There are two crucial questions that if answered, could lead to understanding this problem better:
1. What would be keeping mongod busy for hours after the mongodump stopped? (no other calls were being made)
2. Why closing all the databases fix it?

SSDs: sda, sdb, sdc, sde, sdf
HDD: sdd

Below: performance information captured while MongoDB was in the degraded state, 15 minutes after I started a mongodump


iostat 10 -x

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdd 0.00 0.00 0.00 4.00 0.00 16.00 8.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

avg-cpu: %user %nice %system %iowait %steal %idle
0.36 0.00 12.17 0.00 0.00 87.47

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 9.00 0.00 31.00 0.00 160.00 10.32 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

avg-cpu: %user %nice %system %iowait %steal %idle
0.37 0.00 12.39 0.00 0.00 87.24

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

avg-cpu: %user %nice %system %iowait %steal %idle
0.12 0.00 12.35 0.00 0.00 87.53

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

avg-cpu: %user %nice %system %iowait %steal %idle
0.25 0.00 12.69 0.25 0.00 86.82

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 21.00 0.00 18.00 0.00 360.00 0.00 40.00 0.01 0.56 0.56 0.00 0.56 1.00
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdf 0.00 1.00 0.00 3.00 0.00 16.00 10.67 0.03 10.00 0.00 10.00 3.33 1.00
dm-0 0.00 0.00 24.00 0.00 96.00 0.00 8.00 0.08 3.33 3.33 0.00 0.42 1.00

vmstat

procs ----------memory--------- --swap- ----io--- system- ---cpu---
r b swpd free buff cache si so bi bo in cs us sy id wa
1 0 2684 79500 34384 23689380 0 0 0 4 1615 1855 0 13 87 0
1 0 2684 79500 34384 23689384 0 0 0 0 1619 1903 0 14 86 0
1 0 2684 79500 34384 23689384 0 0 0 0 1624 1847 0 12 88 0
1 0 2684 79500 34384 23689384 0 0 0 0 1617 1848 0 14 86 0
1 0 2684 79484 34384 23689384 0 0 0 4 1639 1968 0 13 87 0
1 0 2684 79484 34384 23689384 0 0 0 0 1572 1869 0 15 85 0
1 0 2684 79484 34384 23689384 0 0 0 0 1578 1891 0 10 89 0
1 0 2684 80576 34384 23689384 0 0 0 0 1654 2084 0 12 88 0
2 0 2684 80724 34384 23689092 0 0 0 0 1616 1946 0 12 88 0
1 0 2684 80848 34384 23688152 0 0 0 0 1704 2217 0 14 86 0
1 0 2684 80980 34384 23688180 0 0 0 0 1666 1969 0 12 88 0
1 0 2684 81132 34384 23688180 0 0 0 0 1600 1868 0 13 87 0
1 0 2684 81132 34384 23688180 0 0 0 0 1593 1912 0 12 87 0
1 0 2684 81256 34384 23688180 0 0 0 0 1639 1990 0 13 87 0
1 0 2684 81504 34392 23688172 0 0 0 40 1651 1948 0 12 88 0

mongostat

insert query update delete getmore command flushes mapped vsize res faults locked % idx miss % qr|qw ar|aw netIn netOut conn set repl time
0 0 0 0 1 3 0 191g 382g 16.7g 0 0 0 0|0 1|0 233b 4k 7 wireclub M 13:20:27
0 0 0 0 0 3 0 191g 382g 16.7g 0 0 0 0|0 1|0 251b 3k 7 wireclub M 13:20:28
0 0 0 0 1 3 0 191g 382g 16.7g 0 0 0 0|0 1|0 233b 4k 7 wireclub M 13:20:29
0 0 0 0 0 4 0 191g 382g 16.7g 0 0 0 0|0 1|0 313b 4k 7 wireclub M 13:20:30
0 0 0 0 1 3 0 191g 382g 16.7g 0 0 0 0|0 1|0 233b 4k 7 wireclub M 13:20:31
0 0 0 0 0 4 0 191g 382g 16.7g 0 0 0 0|0 1|0 313b 4k 7 wireclub M 13:20:32
0 0 0 0 1 3 0 191g 382g 16.7g 0 0 0 0|0 1|0 233b 4k 7 wireclub M 13:20:33
0 0 0 0 0 4 0 191g 382g 16.7g 0 0 0 0|0 1|0 313b 4k 7 wireclub M 13:20:34
0 0 0 0 1 3 0 191g 382g 16.7g 0 0 0 0|0 1|0 233b 4k 7 wireclub M 13:20:35
0 0 0 0 0 4 0 191g 382g 16.7g 0 0 0 0|0 1|0 313b 4k 7 wireclub M 13:20:36

sar -b 1

01:21:09 PM pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s pgscand/s pgsteal/s %vmeff
01:21:10 PM 0.00 0.00 544.00 0.00 1187.00 0.00 3705835.00 0.00 0.00
01:21:11 PM 0.00 0.00 499.00 0.00 1063.00 0.00 3701894.00 0.00 0.00
01:21:12 PM 0.00 0.00 498.00 0.00 1062.00 0.00 3706644.00 0.00 0.00
01:21:13 PM 0.00 0.00 490.00 0.00 1018.00 0.00 3698563.00 0.00 0.00
01:21:14 PM 0.00 0.00 499.00 0.00 1065.00 0.00 3700551.00 0.00 0.00
01:21:15 PM 0.00 0.00 496.00 0.00 1025.00 0.00 3706918.00 0.00 0.00
01:21:16 PM 0.00 0.00 499.00 0.00 1070.00 0.00 3699924.00 0.00 0.00
01:21:17 PM 0.00 0.00 496.00 0.00 1026.00 0.00 3704611.00 0.00 0.00
01:21:18 PM 0.00 0.00 499.00 0.00 1064.00 0.00 3700359.00 0.00 0.00
01:21:19 PM 0.00 0.00 497.00 0.00 1032.00 0.00 3702982.00 0.00 0.00
01:21:20 PM 0.00 0.00 543.00 0.00 1182.00 0.00 3704660.00 0.00 0.00
01:21:21 PM 0.00 4.00 502.00 0.00 1071.00 0.00 3704547.00 0.00 0.00
01:21:22 PM 0.00 16.00 499.00 0.00 1070.00 0.00 3704263.00 0.00 0.00
01:21:23 PM 0.00 0.00 497.00 0.00 1027.00 0.00 3702598.00 0.00 0.00
01:21:24 PM 0.00 0.00 498.00 0.00 1063.00 0.00 3709844.00 0.00 0.00
01:21:25 PM 0.00 0.00 497.00 0.00 1032.00 0.00 3698851.00 0.00 0.00
01:21:26 PM 0.00 0.00 498.00 0.00 1062.00 0.00 3710471.00 0.00 0.00
01:21:27 PM 0.00 8.00 496.00 0.00 1026.00 0.00 3701798.00 0.00 0.00
01:21:28 PM 0.00 0.00 496.00 0.00 1062.00 0.00 3707988.00 0.00 0.00
01:21:29 PM 0.00 0.00 496.00 0.00 1026.00 0.00 3706659.00 0.00 0.00
01:21:30 PM 0.00 0.00 543.00 0.00 1182.00 0.00 3700647.00 0.00 0.00

Comment by Eliot Horowitz (Inactive) [ 25/Jul/11 ]

That looks like its just transferring?

How fast is it transferring?

Comment by Braden Evans [ 23/Jul/11 ]

The disks are all ssds, sdc is holding the database currently being synced.

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

avg-cpu: %user %nice %system %iowait %steal %idle
0.12 0.00 12.40 1.18 0.00 86.30

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 181.00 0.00 23168.00 0.00 256.00 0.18 0.99 0.99 0.00 0.72 13.00
sde 0.00 0.50 0.00 1.50 0.00 8.00 10.67 0.01 10.00 0.00 10.00 3.33 0.50
sdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

avg-cpu: %user %nice %system %iowait %steal %idle
0.06 0.00 22.06 0.88 0.00 77.00

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 177.50 0.00 20544.00 0.00 231.48 0.18 1.04 1.04 0.00 0.68 12.00
sde 0.00 0.00 0.00 0.50 0.00 2.00 8.00 0.00 0.00 0.00 0.00 0.00 0.00
sdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

avg-cpu: %user %nice %system %iowait %steal %idle
0.18 0.00 20.44 0.36 0.00 79.03

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 2.00 0.00 8.00 8.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 101.00 0.00 8438.00 0.00 167.09 0.07 0.64 0.64 0.00 0.50 5.00
sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 24.78 0.06 0.00 75.16

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdc 0.00 0.00 9.00 0.00 226.00 0.00 50.22 0.07 7.22 7.22 0.00 0.56 0.50
sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 26.34 0.00 0.00 73.66

Comment by Eliot Horowitz (Inactive) [ 23/Jul/11 ]

Can you run iostat -x 2
Looks like disk is probably saturated.

Comment by Braden Evans [ 23/Jul/11 ]

Another case of this happening, currently all that database is doing is initial sync to 1 node:

Top:
2046 mongodb 20 0 382g 14g 14g S 100 60.8 7:18.94 mongod

vmstat
procs ----------memory--------- --swap- ----io--- system- ---cpu---
r b swpd free buff cache si so bi bo in cs us sy id wa
1 0 0 1339824 56904 22429384 0 0 2906 2056 482 512 2 6 89 3
1 0 0 1339536 56904 22429132 0 0 0 2048 637 811 0 12 88 0
1 0 0 1339544 56908 22429136 0 0 0 4 656 785 0 13 87 0
1 0 0 1339552 56908 22429136 0 0 0 2048 631 760 0 13 87 0
1 0 0 1339680 56908 22429136 0 0 0 2048 664 798 0 12 88 0
1 0 0 1339680 56916 22429128 0 0 0 12 613 746 0 15 85 1
1 0 0 1339680 56916 22429136 0 0 0 2112 655 787 0 14 86 0
1 0 0 1339924 56916 22429136 0 0 0 2048 637 812 0 14 86 0
1 0 0 1339924 56916 22429136 0 0 0 0 634 767 0 10 90 0
1 0 0 1339924 56916 22429136 0 0 0 2048 617 748 0 15 85 0
1 0 0 1340048 56916 22429136 0 0 0 0 659 821 0 14 86 0
1 0 0 1340048 56924 22429132 0 0 0 2060 628 772 0 14 84 1
1 0 0 1340404 56924 22428232 0 0 0 2048 729 1031 0 9 91 0
1 0 0 1340512 56924 22428200 0 0 0 4 627 781 0 12 88 0
1 0 0 1340512 56924 22428200 0 0 0 2048 654 781 0 13 87 0
1 0 0 1340760 56924 22428200 0 0 0 0 630 746 0 13 87 0
1 0 0 1340768 56924 22428020 0 0 0 2048 656 814 0 12 88 0
1 0 0 1340768 56924 22428020 0 0 0 2092 653 792 0 13 87 0
1 0 0 1340800 56940 22428004 0 0 0 36 676 802 0 12 87 1

Mongostat:
insert query update delete getmore command flushes mapped vsize res faults locked % idx miss % qr|qw ar|aw netIn netOut conn set repl time
0 0 0 0 0 2 0 191g 382g 14.2g 0 0 0 0|0 0|0 124b 2k 14 wireclub M 17:32:12
0 0 0 0 0 4 1 191g 382g 14.2g 0 0 0 0|0 0|0 378b 3k 14 wireclub M 17:32:13
0 0 0 0 0 2 0 191g 382g 14.2g 0 0 0 0|0 0|0 124b 2k 14 wireclub M 17:32:14
0 0 0 0 0 4 0 191g 382g 14.2g 0 0 0 0|0 0|0 378b 3k 14 wireclub M 17:32:15
0 1 1 0 0 3 0 191g 382g 14.2g 0 0 0 0|0 0|0 362b 3k 14 wireclub M 17:32:16
0 0 0 0 0 4 0 191g 382g 14.2g 0 0 0 0|0 0|0 378b 3k 14 wireclub M 17:32:17
0 0 0 0 0 2 0 191g 382g 14.2g 0 0 0 0|0 0|0 124b 2k 14 wireclub M 17:32:18
0 0 0 0 0 4 0 191g 382g 14.2g 0 0 0 0|0 0|0 378b 3k 14 wireclub M 17:32:19
0 0 0 0 0 2 0 191g 382g 14.2g 0 0 0 0|0 0|0 124b 2k 14 wireclub M 17:32:20
0 1 1 0 0 5 0 191g 382g 14.2g 0 0 0 0|0 0|0 616b 3k 14 wireclub M 17:32:21

Fri Jul 22 17:33:24 [conn17] serverStatus was very slow:

{ after basic: 0, middle of mem: 3320, after mem: 3320, after connections: 3320, after extra info: 4760, after counters: 4760, after repl: 4760, after asserts: 4760 }

Fri Jul 22 17:33:24 [conn17] query admin.$cmd ntoreturn:1 command:

{ serverStatus: 1 }

reslen:1593 4803ms
Fri Jul 22 17:33:37 [conn6] serverStatus was very slow:

{ after basic: 0, middle of mem: 3280, after mem: 3280, after connections: 3280, after extra info: 17790, after counters: 17790, after repl: 17790, after asserts: 17790 }

Fri Jul 22 17:33:37 [conn6] query admin.$cmd ntoreturn:1 command:

{ serverStatus: 1 }

reslen:1593 17985ms
Fri Jul 22 17:33:37 [conn10] getmore gridfs.fs.chunks cid:1249412216916989076 getMore: {} bytes:4211712 nreturned:126 exhaust 37695ms
Fri Jul 22 17:33:38 [conn17] serverStatus was very slow:

{ after basic: 0, middle of mem: 12750, after mem: 12750, after connections: 12750, after extra info: 12750, after counters: 12750, after repl: 12750, after asserts: 12750 }

Fri Jul 22 17:33:38 [conn17] query admin.$cmd ntoreturn:1 command:

{ serverStatus: 1 }

reslen:1593 12889ms
Fri Jul 22 17:34:10 [conn6] serverStatus was very slow:

{ after basic: 0, middle of mem: 11560, after mem: 11560, after connections: 11560, after extra info: 12220, after counters: 12220, after repl: 12220, after asserts: 12220 }

Fri Jul 22 17:34:10 [conn6] query admin.$cmd ntoreturn:1 command:

{ serverStatus: 1 }

reslen:1593 12329ms
Fri Jul 22 17:34:10 [conn17] serverStatus was very slow:

{ after basic: 0, middle of mem: 11600, after mem: 11600, after connections: 11600, after extra info: 12930, after counters: 12930, after repl: 12930, after asserts: 12930 }

Fri Jul 22 17:34:10 [conn17] query admin.$cmd ntoreturn:1 command:

{ serverStatus: 1 }

reslen:1593 13041ms
Fri Jul 22 17:34:15 [conn6] serverStatus was very slow:

{ after basic: 0, middle of mem: 2300, after mem: 2300, after connections: 2300, after extra info: 4290, after counters: 4290, after repl: 4290, after asserts: 4290 }

Fri Jul 22 17:34:15 [conn6] query admin.$cmd ntoreturn:1 command:

{ serverStatus: 1 }

reslen:1593 4329ms
Fri Jul 22 17:34:15 [conn17] serverStatus was very slow:

{ after basic: 0, middle of mem: 2320, after mem: 2320, after connections: 2320, after extra info: 4310, after counters: 4310, after repl: 4310, after asserts: 4310 }

Fri Jul 22 17:34:15 [conn17] query admin.$cmd ntoreturn:1 command:

{ serverStatus: 1 }

reslen:1593 4346ms

Comment by Braden Evans [ 23/Jul/11 ]

We can now rule out hyperthreading being the cause as well.

Generated at Thu Feb 08 03:03:09 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.