[SERVER-18942] Mongodb enter dead-loop when client process be killed Created: 12/Jun/15  Updated: 03/Aug/15  Resolved: 03/Aug/15

Status: Closed
Project: Core Server
Component/s: Internal Code, Networking
Affects Version/s: 2.6.6
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: ZhangKun [X] Assignee: Mark Benvenuto
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Sprint: Platform 7 08/10/15
Participants:

 Description   

I wrote a go-language mongodb client to use mongodb data, if I kill my client process, I found that mongodb entered dead-loop state. I used htop to see what happens:

PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command
1211 root 20 0 1395M 138M 97656 S 100. 1.8 1h16:10 /root/mongodb/bin/mongod -f /home/hunter/hunterServer/rundir/conf/mongodb.conf -fork
347 root 20 0 1395M 138M 97656 R 100. 1.8 1h10:24 /root/mongodb/bin/mongod -f /home/hunter/hunterServer/rundir/conf/mongodb.conf -fork

It looked that the thread(PID:347) of mongod process(PID:1211) occupy 100% CPU, I tried use mongostat, mongotop to figure out which operation there, but no any operation in mongodb.

I used pstack 347, it looked that mongod enter a dead-loop because the following unhandled exception.

[root@hkDEV6 ~]# pstack 347
Thread 1 (process 347):
#0 0x00007f2fb3ab5547 in ?? () from /lib64/libgcc_s.so.1
#1 0x00007f2fb38263d6 in dl_iterate_phdr () from /lib64/libc.so.6
#2 0x00007f2fb3ab6207 in _Unwind_Find_FDE () from /lib64/libgcc_s.so.1
#3 0x00007f2fb3ab3603 in ?? () from /lib64/libgcc_s.so.1
#4 0x00007f2fb3ab46a6 in _Unwind_RaiseException () from /lib64/libgcc_s.so.1
#5 0x00007f2fb3ffad15 in __cxa_throw () from /usr/lib64/libstdc++.so.6
#6 0x00000000011ba4cd in mongo::Socket::handleRecvError(int, int) ()
#7 0x00000000011bae44 in mongo::Socket::_recv(char*, int) ()
#8 0x00000000011bae59 in mongo::Socket::unsafe_recv(char*, int) ()
#9 0x00000000011baea5 in mongo::Socket::recv(char*, int) ()
#10 0x00000000011af94c in mongo::MessagingPort::recv(mongo::Message&) ()
#11 0x00000000011b28a8 in mongo::PortMessageServer::handleIncomingMsg(void*) ()
#12 0x00007f2fb44539d1 in start_thread () from /lib64/libpthread.so.0
#13 0x00007f2fb37effbd in clone () from /lib64/libc.so.6



 Comments   
Comment by Ramon Fernandez Marina [ 03/Aug/15 ]

ZhangKun, have you seen this behavior again? Is this still an issue for you? Given Mark's answer I'm going to close this ticket for now; if you observe this behavior again please let us know so we can re-open this ticket.

Thanks,
Ramón.

Comment by Mark Benvenuto [ 24/Jul/15 ]

_Unwind_Find_FDE calls dl_iterate_phdr using _Unwind_IteratePhdrCallback as a callback. See http://osxr.org/glibc/source/sysdeps/generic/unwind-dw2-fde-glibc.c as an example of the source code. I suspect that either the binary was corrupt or some other data structure was corrupt in the binary image. This is the only explanation I can think of why the code got stuck in a infinite loop.

Comment by ZhangKun [X] [ 12/Jun/15 ]

Here something I missed in the submit:
The environment is :
OS: CentOS release 6.6 (Final)
Disk:NUMA, Network disk

Generated at Thu Feb 08 03:49:19 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.