[SERVER-236] osx doesn't shutdown cleanly ( WAS: Mongo crashed on OS X - Bad file descriptor ) Created: 14/Aug/09  Updated: 12/Jul/16  Resolved: 29/Aug/09

Status: Closed
Project: Core Server
Component/s: Stability
Affects Version/s: None
Fix Version/s: 1.1.0

Type: Bug Priority: Major - P3
Reporter: Ryan Nitz Assignee: Eliot Horowitz (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

os x


Participants:

 Description   

mongodb-osx-x86_64-2009-08-06

Fri Aug 14 14:19:28 MessagingPort recv() error "Bad file descriptor" (9) Fri Aug 14 14:19:28 192.168.2.20r:55765MessagingPort recv() error "
Bad file descriptorFri Aug 14 14:19:28 " (end connection 9192.168.16.152:55765)
192.168.16.152:54997Fri Aug 14 14:19:28 Fri Aug 14 14:19:28
MessagingPort recv() error "MessagingPort recv() error "Fri Aug 14 14:19:28 Fri Aug 14 14:19:28 Fri Aug 14 14:19:28 Bad file descriptorBad file descriptorMessagingPort recv() error "MessagingPort recv() error "Fri Aug 14 14:19:28 end connection " (" (Bad file descriptorBad file descriptorMessagingPort recv() error "192.168.16.152:54997Fri Aug 14 14:19:28 99" (" (Bad file descriptor
MessagingPort recv() error ") ) Fri Aug 14 14:19:28 99" (Bad file descriptorFri Aug 14 14:19:28 Fri Aug 14 14:19:28 Fri Aug 14 14:19:28 Fri Aug 14 14:19:28 192.168.16.152:61653192.168.16.152:982MessagingPort recv() error ") ) 9" (MessagingPort recv() error "MessagingPort recv() error "MessagingPort recv() error "MessagingPort recv() error "

Bad file descriptor192.168.2.20:43242192.168.16.152:42966) 9Bad file descriptorBad file descriptorBad file descriptorBad file descriptorFri Aug 14 14:19:28 Fri Aug 14 14:19:28 " (

192.168.2.20:59878) " (" (" (" (end connection end connection 9Fri Aug 14 14:19:28 Fri Aug 14 14:19:28
192.168.2.20:163389999192.168.16.152:61653192.168.16.152:982) end connection end connection Fri Aug 14 14:19:28
) ) ) )

192.168.2.20:15570192.168.2.20:43242192.168.16.152:42966end connection Fri Aug 14 14:19:28 192.168.2.20:2008192.168.2.20:17874192.168.2.20:51922192.168.2.20:17106

192.168.2.20:59878end connection

Fri Aug 14 14:19:28
192.168.2.20:16338Fri Aug 14 14:19:28 Fri Aug 14 14:19:28 Fri Aug 14 14:19:28 Fri Aug 14 14:19:28 end connection
end connection end connection end connection end connection 192.168.2.20:15570192.168.2.20:2008192.168.2.20:17874192.168.2.20:51922192.168.2.20:17106

Fri Aug 14 14:19:28 closeAllFiles() finished
Fri Aug 14 14:19:28

dbexit: really exiting now



 Comments   
Comment by Eliot Horowitz (Inactive) [ 15/Sep/09 ]

released in 1.1.0

Comment by Eliot Horowitz (Inactive) [ 29/Aug/09 ]

pretty sure it was a sm gc issue that we fixed

Comment by Ryan Nitz [ 26/Aug/09 ]

Upgrading...

Comment by Eliot Horowitz (Inactive) [ 26/Aug/09 ]

what version are you running now?
can you try 0.9.10 if you haven't
one potential fix is in there

Comment by Ryan Nitz [ 26/Aug/09 ]

This just happened again, the difference this time was that a java process/driver disconnected from Mongo. Looks like the process that disconnected was in the middle of an operation.

-------------------------------------

Exception in the client:

can't say something
com.mongodb.MongoException$Network: can't say something
at com.mongodb.DBTCP.say(DBTCP.java:133)
at com.mongodb.DBMessageLayer.doUpdate(DBMessageLayer.java:42)
at com.mongodb.DBApiLayer$MyCollection.update(DBApiLayer.java:461)
at lindex.ds.PageDs.updateAttrs(PageDs.java:134)
at lindex.telemeter.PageCrawlSensor.execute(PageCrawlSensor.java:128)
at oemware.core.MultiThread.run(MultiThread.java:164)
at java.lang.Thread.run(Thread.java:637)
Caused by: java.nio.channels.ClosedByInterruptException
at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
at sun.nio.ch.SocketChannelImpl.write0(SocketChannelImpl.java:372)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:388)
at java.nio.channels.SocketChannel.write(SocketChannel.java:360)
at com.mongodb.DBPort.go(DBPort.java:79)
at com.mongodb.DBPort.say(DBPort.java:64)
at com.mongodb.DBTCP.say(DBTCP.java:127)

-------------------------------------

Server:

Wed Aug 26 17:47:36 end connection 192.168.2.11:28380
Wed Aug 26 17:53:42 query lindex.moduleheartbeats ntoreturn:1 reslen:156 nreturned:1 0ms
Wed Aug 26 17:54:40 Got signal: 11 (Segmentation fault).
Wed Aug 26 17:54:40 Last op: { opid: 90667, active: 0, op: "update", ns: "lindex.moduleheartbeats", query: "

{ _id: ObjId(a13b4364da9c954aaa525d00) }

", inLock: 1, client: "192.168.2.11:51461" }
Wed Aug 26 17:54:41 Backtrace:
0x100139582 0x7fff82aca3fa 0 0x10005ae30 0x1000c4c1a 0x100092b33 0x100093333 0x1000e05f2 0x10014bd04 0x7fff82a92e8b 0x7fff82a92d4d
0 mongod 0x0000000000139582 _ZN5mongo10abruptQuitEi + 1426
1 libSystem.B.dylib 0x0000000082aca3fa _sigtramp + 26
2 ??? 0x0000000000000000 0x0 + 0
3 mongod 0x000000000005ae30 _ZN5mongo11PooledScopeD0Ev + 80
4 mongod 0x00000000000c4c1a _ZN5mongo9JSMatcherD1Ev + 106
5 mongod 0x0000000000092b33 _ZN5mongo12ClientCursorD1Ev + 179
6 mongod 0x0000000000093333 _ZN5mongo14idleTimeReportEj + 371
7 mongod 0x00000000000e05f2 _ZN5mongo11statsThreadEv + 2290
8 mongod 0x000000000014bd04 thread_proxy + 132
9 libSystem.B.dylib 0x0000000082a92e8b _pthread_start + 316
10 libSystem.B.dylib 0x0000000082a92d4d thread_start + 13
Wed Aug 26 17:54:41 dbexit:
Wed Aug 26 17:54:41 Listener on port 28017 aborted.
Wed Aug 26 17:54:41 Listener on port 27017 aborted
Wed Aug 26 17:54:41 MessagingPort recv() error "Bad file descriptor" (9) 192.168.2.11:40392
Wed Aug 26 17:54:41 end connection 192.168.2.11:40392
Wed Aug 26 17:54:41 MessagingPort recv() error "Bad file descriptor" (9) 127.0.0.1:62952Wed Aug 26 17:54:41
MessagingPort recv() error "Wed Aug 26 17:54:41 Wed Aug 26 17:54:41 Bad file descriptorMessagingPort recv() error "end connection " (Bad file descriptor127.0.0.1:629529Wed Aug 26 17:54:41 " (
) MessagingPort recv() error "9Wed Aug 26 17:54:41 192.168.2.11:1481Wed Aug 26 17:54:41 Wed Aug 26 17:54:41 Wed Aug 26 17:54:41 Wed Aug 26 17:54:41 Bad file descriptor) MessagingPort recv() error "
MessagingPort recv() error "MessagingPort recv() error "MessagingPort recv() error "MessagingPort recv() error "" (192.168.2.11:20680Bad file descriptorWed Aug 26 17:54:41 Bad file descriptorBad file descriptorBad file descriptorBad file descriptor9
" (end connection " (" (" (" () Wed Aug 26 17:54:41 9192.168.2.11:14819999192.168.2.11:41949end connection )
) ) ) )
192.168.2.11:20680192.168.2.11:42209192.168.2.11:43233192.168.2.11:19144192.168.2.11:3038192.168.2.11:8158Wed Aug 26 17:54:41

end connection Wed Aug 26 17:54:41 Wed Aug 26 17:54:41 Wed Aug 26 17:54:41 Wed Aug 26 17:54:41 Wed Aug 26 17:54:41 192.168.2.11:41949end connection end connection end connection end connection end connection
192.168.2.11:42209192.168.2.11:43233192.168.2.11:19144192.168.2.11:3038192.168.2.11:8158

Wed Aug 26 17:54:41 closeAllFiles() finished
Wed Aug 26 17:54:41 dbexit: really exiting now
Assertion failed: (!pthread_mutex_destroy(&m)), function ~recursive_mutex, file /usr/64/include/boost/thread/pthread/recursive_mutex.hpp, line 62.
Wed Aug 26 17:54:41 Got signal: 6 (Abort trap).
Wed Aug 26 17:54:41 Last op: { opid: 90667, active: 0, op: "update", ns: "lindex.moduleheartbeats", query: "

{ _id: ObjId(a13b4364da9c954aaa525d00) }

", inLock: 1, client: "192.168.2.11:51461" }
Wed Aug 26 17:54:41 Backtrace:
0x100139582 0x7fff82aca3fa 0x7fff5fbfd910 0x10014d6da 0x10013a0f9 0x10013a44e 0x10013f7c1 0x100001294 0x2
0 mongod 0x0000000000139582 _ZN5mongo10abruptQuitEi + 1426
1 libSystem.B.dylib 0x0000000082aca3fa _sigtramp + 26
2 ??? 0x000000005fbfd910 0x0 + 1606408464
3 mongod 0x000000000014d6da _ZN5boost11this_thread5sleepERKNS_10posix_time5ptimeE + 154
4 mongod 0x000000000013a0f9 _ZN5mongo14_initAndListenEiPKc + 1193
5 mongod 0x000000000013a44e _ZN5mongo13initAndListenEiPKc + 14
6 mongod 0x000000000013f7c1 main + 21233
7 mongod 0x0000000000001294 start + 52
8 ??? 0x0000000000000002 0x0 + 2
Wed Aug 26 17:54:41 dbexit: ; exiting immediately

Comment by Eliot Horowitz (Inactive) [ 20/Aug/09 ]

ah - i see.

Comment by Ryan Nitz [ 20/Aug/09 ]


When I hit CTRL-C

Comment by Eliot Horowitz (Inactive) [ 20/Aug/09 ]

Did you try to shut down the db and that happened? Or just randomly when a client dies?

Comment by Ryan Nitz [ 20/Aug/09 ]

This should be opened again.

I can't recreate consistently (just happens). Sync issue?

Thu Aug 20 14:42:41 MessagingPort recv() error "Bad file descriptor" (9) 127.0.0.1:60101
Thu Aug 20 14:42:41 end connection 127.0.0.1:60101
Thu Aug 20 14:42:41 closeAllFiles() finished
Thu Aug 20 14:42:41 dbexit: really exiting now
Assertion failed: (!pthread_mutex_destroy(&m)), function ~recursive_mutex, file /usr/64/include/boost/thread/pthread/recursive_mutex.hpp, line 62.
Thu Aug 20 14:42:41 Got signal: 6 (Abort trap).
Thu Aug 20 14:42:41 Last op: { opid: 1670, active: 0, op: "update", ns: "lindex.moduleheartbeats", query: "

{ _id: ObjId(6f3b436413898d4a74557800) }

", inLock: 1, client: "192.168.16.152:54392" }
Thu Aug 20 14:42:41 Backtrace:
0x100139582 0x7fff82aca3fa 0x7fff5fbfd910 0x10014d6da 0x10013a0f9 0x10013a44e 0x10013f7c1 0x100001294 0x2
0 mongod 0x0000000000139582 _ZN5mongo10abruptQuitEi + 1426
1 libSystem.B.dylib 0x0000000082aca3fa _sigtramp + 26
2 ??? 0x000000005fbfd910 0x0 + 1606408464
3 mongod 0x000000000014d6da _ZN5boost11this_thread5sleepERKNS_10posix_time5ptimeE + 154
4 mongod 0x000000000013a0f9 _ZN5mongo14_initAndListenEiPKc + 1193
5 mongod 0x000000000013a44e _ZN5mongo13initAndListenEiPKc + 14
6 mongod 0x000000000013f7c1 main + 21233
7 mongod 0x0000000000001294 start + 52
8 ??? 0x0000000000000002 0x0 + 2
Thu Aug 20 14:42:41 dbexit: ; exiting immediately

Comment by Eliot Horowitz (Inactive) [ 18/Aug/09 ]

closing for now

Comment by Ryan Nitz [ 18/Aug/09 ]

Not able to recreate this. My only guess is that I had not properly updated the symlink. I would close it.

Comment by Ryan Nitz [ 17/Aug/09 ]

Nothing yet... I will move to the nightly build in the next day or so.

Comment by Eliot Horowitz (Inactive) [ 16/Aug/09 ]

any luck?
if not - try the next nightl (so download 8/17)
added some logging

Comment by Ryan Nitz [ 16/Aug/09 ]

Yes, the last stack trace is with 0.9.8 (previous ones with older nightly build). Let me see if I can consistently reproduce....

Comment by Eliot Horowitz (Inactive) [ 15/Aug/09 ]

That's definitely with 0.9.8?
Can't reproduce with anything, so if you could send a way to reproduce, would be helpful.

Comment by Ryan Nitz [ 15/Aug/09 ]

Nope... still a problem.

The problem seems to be when the client disconnects and then reconnects right away.

Sat Aug 15 11:46:33 Got signal: 11 (Segmentation fault).
Sat Aug 15 11:46:33 Last op:

{ opid: 10188, active: 0, op: "getMore", ns: "?index.pages", query: "", inLock: 1, client: "192.168.0.53:56748" }

Sat Aug 15 11:46:33 Backtrace:
0x100139582 0x7fff82aca3fa 0 0x10005ae30 0x1000c4c1a 0x100092b33 0x100093333 0x1000e05f2 0x10014bd04 0x7fff82a92e8b 0x7fff82a92d4d
0 mongod 0x0000000000139582 _ZN5mongo10abruptQuitEi + 1426
1 libSystem.B.dylib 0x0000000082aca3fa _sigtramp + 26
2 ??? 0x0000000000000000 0x0 + 0
3 mongod 0x000000000005ae30 _ZN5mongo11PooledScopeD0Ev + 80
4 mongod 0x00000000000c4c1a _ZN5mongo9JSMatcherD1Ev + 106
5 mongod 0x0000000000092b33 _ZN5mongo12ClientCursorD1Ev + 179
6 mongod 0x0000000000093333 _ZN5mongo14idleTimeReportEj + 371
7 mongod 0x00000000000e05f2 _ZN5mongo11statsThreadEv + 2290
8 mongod 0x000000000014bd04 thread_proxy + 132
9 libSystem.B.dylib 0x0000000082a92e8b _pthread_start + 316
10 libSystem.B.dylib 0x0000000082a92d4d thread_start + 13
Sat Aug 15 11:46:33 dbexit:
Sat Aug 15 11:46:33 Listener on port 28017 aborted.
Sat Aug 15 11:46:33 Listener on port 27017 aborted
Sat Aug 15 11:46:33 MessagingPort recv() error "Bad file descriptor" (9) 127.0.0.1:29657
Sat Aug 15 11:46:33 end connection 127.0.0.1:29657
Sat Aug 15 11:46:33 closeAllFiles() finished
Sat Aug 15 11:46:33 dbexit: really exiting now

Comment by Ryan Nitz [ 15/Aug/09 ]

K... it ran without failure. I don't have permission to assign/close bugs so you will have to handle.

Comment by Ryan Nitz [ 15/Aug/09 ]

I upgraded last night. I'll run the code that was causing the failure and let you know.

Comment by Eliot Horowitz (Inactive) [ 15/Aug/09 ]

any update on this?
if you're not seeing it anymore, can you close the case

Comment by Eliot Horowitz (Inactive) [ 14/Aug/09 ]

fairly sure this was fixed in 0.9.8

Comment by Ryan Nitz [ 14/Aug/09 ]

More info... switching to latest release to see if it persists.

Fri Aug 14 18:02:06 CMD: drop lindex.sites
Fri Aug 14 18:02:06 Got signal: 11 (Segmentation fault).
Fri Aug 14 18:02:06 Last op: { opid: 31244, active: 1, secs_running: 0, op: "query", ns: "lindex.$cmd", query: "

{ drop: "sites" }

", inLock: 1, client: "127.0.0.1:61175" }
Fri Aug 14 18:02:06 Backtrace:
0x1001322e2 0x7fff82aca3fa 0 0x100058870 0x1000c0a5a 0x1000900a3 0x100090ac1 0x1000ca389 0x100106573 0x1000d71c4 0x1000cd382 0x100073cd7 0x1000759c7 0x1000f15ac 0x1000f33d1 0x1001303ae 0x100145794 0x7fff82a92e8b 0x7fff82a92d4d
0 mongod 0x00000000001322e2 _ZN5mongo10abruptQuitEi + 1426
1 libSystem.B.dylib 0x0000000082aca3fa _sigtramp + 26
2 ??? 0x0000000000000000 0x0 + 0
3 mongod 0x0000000000058870 _ZN5mongo11PooledScopeD0Ev + 80
4 mongod 0x00000000000c0a5a _ZN5mongo9JSMatcherD1Ev + 106
5 mongod 0x00000000000900a3 _ZN5mongo12ClientCursorD1Ev + 179
6 mongod 0x0000000000090ac1 _ZN5mongo12ClientCursor10invalidateEPKc + 241
7 mongod 0x00000000000ca389 _ZN5mongo13deleteIndexesEPNS_16NamespaceDetailsEPKcS3_RSsRNS_14BSONObjBuilderEb + 73
8 mongod 0x0000000000106573 _ZN5mongo14dropCollectionERKSsRSsRNS_14BSONObjBuilderE + 483
9 mongod 0x00000000000d71c4 _ZN5mongo7CmdDrop3runEPKcRNS_7BSONObjERSsRNS_14BSONObjBuilderEb + 1044
10 mongod 0x00000000000cd382 _ZN5mongo12_runCommandsEPKcRNS_7BSONObjERSt18basic_stringstreamIcSt11char_traitsIcESaIcEERNS_10BufBuilderERNS_14BSONObjBuilderEbi + 2322
11 mongod 0x0000000000073cd7 _ZN5mongo11runCommandsEPKcRNS_7BSONObjERSt18basic_stringstreamIcSt11char_traitsIcESaIcEERNS_10BufBuilderERNS_14BSONObjBuilderEbi + 55
12 mongod 0x00000000000759c7 _ZN5mongo8runQueryERNS_7MessageERSt18basic_stringstreamIcSt11char_traitsIcESaIcEE + 6407
13 mongod 0x00000000000f15ac _ZN5mongo13receivedQueryERNS_10DbResponseERNS_7MessageERSt18basic_stringstreamIcSt11char_traitsIcESaIcEEb + 428
14 mongod 0x00000000000f33d1 _ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERK11sockaddr_in + 1537
15 mongod 0x00000000001303ae _ZN5mongo10connThreadEv + 686
16 mongod 0x0000000000145794 thread_proxy + 132
17 libSystem.B.dylib 0x0000000082a92e8b _pthread_start + 316
18 libSystem.B.dylib 0x0000000082a92d4d thread_start + 13
Fri Aug 14 18:02:06 dbexit:
Fri Aug 14 18:02:06 Listener on port 27017 aborted
Fri Aug 14 18:02:06 Listener on port 28017 aborted.
Fri Aug 14 18:02:06 MessagingPort recv() error "Bad file descriptor" (9) 192.168.2.20:30660
Fri Aug 14 18:02:06 end connection 192.168.2.20:30660
Fri Aug 14 18:02:06 MessagingPort recv() error "Bad file descriptor" (9) 192.168.2.20:39367
Fri Aug 14 18:02:06 end connection 192.168.2.20:39367
Fri Aug 14 18:02:06 MessagingPort recv() error "Bad file descriptor" (9) 127.0.0.1:1737
Fri Aug 14 18:02:06 end connection 127.0.0.1:1737
Fri Aug 14 18:02:06 closeAllFiles() finished
Fri Aug 14 18:02:06 dbexit: really exiting now

Generated at Thu Feb 08 02:53:28 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.