[SERVER-4201] CLONE - Unable to shut down or kill -9 monogd Created: 03/Nov/11  Updated: 30/Mar/12  Resolved: 11/Nov/11

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 1.8.3, 2.0.0, 2.0.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Joachim Kainz Assignee: Unassigned
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Linux mbl-mdb01 2.6.32.10-90.fc12.x86_64 #1 SMP Tue Mar 23 09:47:08 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux
Fedora release 12 (Constantine)


Operating System: Linux
Participants:

 Description   

I created a replication set by adding two servers to an existing server with about 250 GB and having it replicate the data. After being in recovery state for a while we see the new server go into a state were CPU usage becomes very low, but the load-average goes to about 200 or more.

At this point is it impossible to shut down or kill mongod. kill -9 has no effect.

I also noticed that that I cannot cat any file in /proc/<pid> belonging to the mongod process.



 Comments   
Comment by Eliot Horowitz (Inactive) [ 11/Nov/11 ]

Let us know if it comes back after upgrading or if you think its a mongo issue.

Comment by Joachim Kainz [ 03/Nov/11 ]

Yes, I do.

Just found out that the kernel on the machines where we are running mongo has not be patched in about 4 years.

I am trying to get my datacenter guys to bring the kernel up-to-date. I let you know if it reoccurs after patching. I personally believe it will not reoccur.

Comment by Eliot Horowitz (Inactive) [ 03/Nov/11 ]

Are you doing the kill as the mongod user or root?

Comment by Joachim Kainz [ 03/Nov/11 ]

top - 06:24:19 up 14:23, 1 user, load average: 11.05, 8.97, 6.00
Tasks: 287 total, 1 running, 286 sleeping, 0 stopped, 0 zombie
Cpu(s): 1.1%us, 0.2%sy, 0.0%ni, 85.1%id, 13.7%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 33084004k total, 32781156k used, 302848k free, 21404k buffers
Swap: 15999992k total, 68988k used, 15931004k free, 29540580k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5971 mobile 20 0 666m 74m 2452 S 2.7 0.2 30:14.46 dispatcher.js
9876 mobile 20 0 636m 14m 5360 S 0.7 0.0 0:07.63 metric-loader.j
6139 mongod 20 0 548g 28g 28g D 0.3 89.5 91:02.42 mongod
9799 mobile 20 0 636m 15m 5360 S 0.3 0.0 0:07.98 metric-loader.j
9829 mobile 20 0 636m 15m 5372 S 0.3 0.0 0:07.68 metric-loader.j
9833 mobile 20 0 636m 15m 5360 S 0.3 0.0 0:07.72 metric-loader.j
9836 mobile 20 0 636m 15m 5360 S 0.3 0.0 0:08.08 metric-loader.j
9839 mobile 20 0 636m 15m 5360 S 0.3 0.0 0:07.76 metric-loader.j
9845 mobile 20 0 636m 15m 5360 S 0.3 0.0 0:07.88 metric-loader.j
9849 mobile 20 0 636m 14m 5360 S 0.3 0.0 0:07.67 metric-loader.j
9854 mobile 20 0 636m 15m 5360 S 0.3 0.0 0:07.76 metric-loader.j
9856 mobile 20 0 636m 15m 5360 S 0.3 0.0 0:07.84 metric-loader.j
9863 mobile 20 0 636m 15m 5360 S 0.3 0.0 0:07.41 metric-loader.j
9866 mobile 20 0 636m 17m 5372 S 0.3 0.1 0:07.68 metric-loader.j
9877 mobile 20 0 636m 18m 5360 S 0.3 0.1 0:07.62 metric-loader.j
9882 mobile 20 0 636m 14m 5360 S 0.3 0.0 0:07.61 metric-loader.j
9888 mobile 20 0 636m 15m 5372 S 0.3 0.0 0:08.10 metric-loader.j
9899 mobile 20 0 636m 16m 5360 S 0.3 0.1 0:07.80 metric-loader.j
9915 mobile 20 0 636m 16m 5360 S 0.3 0.0 0:07.70 metric-loader.j
9916 mobile 20 0 636m 15m 5372 S 0.3 0.0 0:07.93 metric-loader.j
9921 mobile 20 0 636m 16m 5372 S 0.3 0.0 0:07.47 metric-loader.j
9927 mobile 20 0 636m 18m 5360 S 0.3 0.1 0:07.56 metric-loader.j
9936 mobile 20 0 636m 15m 5360 S 0.3 0.0 0:07.87 metric-loader.j
9938 mobile 20 0 636m 18m 5360 S 0.3 0.1 0:07.71 metric-loader.j
9951 mobile 20 0 636m 14m 5360 S 0.3 0.0 0:07.77 metric-loader.j
29944 mobile 20 0 15040 1304 864 R 0.3 0.0 0:00.07 top
1 root 20 0 4128 452 356 S 0.0 0.0 0:00.99 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/0
4 root 20 0 0 0 0 S 0.0 0.0 0:00.09 ksoftirqd/0
5 root RT 0 0 0 0 S 0.0 0.0 0:00.00 watchdog/0
6 root RT 0 0 0 0 S 0.0 0.0 0:00.10 migration/1
7 root 20 0 0 0 0 S 0.0 0.0 0:00.36 ksoftirqd/1
8 root RT 0 0 0 0 S 0.0 0.0 0:00.00 watchdog/1

Comment by Joachim Kainz [ 03/Nov/11 ]

$ iostat
Linux 2.6.32.10-90.fc12.x86_64 (mbl-mdb02) 11/03/2011 x86_64 (8 CPU)

avg-cpu: %user %nice %system %iowait %steal %idle
2.40 0.00 0.73 16.31 0.00 80.57

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 52.07 12.76 10989.29 659801 568168886

Comment by Joachim Kainz [ 03/Nov/11 ]

$ iostat
Linux 2.6.32.10-90.fc12.x86_64 (mbl-mdb02) 11/03/2011 x86_64 (8 CPU)

avg-cpu: %user %nice %system %iowait %steal %idle
2.40 0.00 0.73 16.28 0.00 80.60

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 52.29 12.82 11037.23 659785 568168206

Generated at Thu Feb 08 03:05:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.