[SERVER-3927] killing clients of a loaded Mongo 1.8.3 causing seg fault Created: 22/Sep/11 Updated: 29/Feb/12 Resolved: 30/Dec/11 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 1.8.1, 1.8.3 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Brett Kiefer | Assignee: | Spencer Brody (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
FreeBSD trellisfc1.hq.fogcreek.com 8.2-RELEASE FreeBSD 8.2-RELEASE #0: Thu Feb 17 02:41:51 UTC 2011 root@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64 |
||
| Operating System: | ALL |
| Participants: |
| Description |
|
This is the easiest way I have found to reproduce the issue we are seeing in production: Repro:
Expected: Observed: Thu Sep 22 11:29:29 [conn109] SocketException in connThread, closing client connection Thu Sep 22 11:29:30 Got signal: 11 (Segmentation fault: 11). Thu Sep 22 11:29:30 Backtrace: Thu Sep 22 11:29:30 dbexit: We've seen this three times in production, and I have reproduced it on a test server. I'll try to get a build that will give me a backtrace. |
| Comments |
| Comment by Spencer Brody (Inactive) [ 30/Dec/11 ] | ||||||||||||||||||||||||||||||||||||||||
|
I'm going to resolve this ticket due to lack of activity. If this is still a problem please re-open the ticket. | ||||||||||||||||||||||||||||||||||||||||
| Comment by Spencer Brody (Inactive) [ 28/Nov/11 ] | ||||||||||||||||||||||||||||||||||||||||
|
Hey Tim, | ||||||||||||||||||||||||||||||||||||||||
| Comment by Eliot Horowitz (Inactive) [ 30/Sep/11 ] | ||||||||||||||||||||||||||||||||||||||||
|
Does the process crash every time you do a map/reduce or eval? Or just sometimes. Access would be great. | ||||||||||||||||||||||||||||||||||||||||
| Comment by Tim Stewart [ 28/Sep/11 ] | ||||||||||||||||||||||||||||||||||||||||
|
Eliot, could you rephrase your question about the JS crash? I don't understand what you're asking. I'm sure we could arrange for you to have access to the box. Let me know if this is something you'd like to do. | ||||||||||||||||||||||||||||||||||||||||
| Comment by Eliot Horowitz (Inactive) [ 27/Sep/11 ] | ||||||||||||||||||||||||||||||||||||||||
|
Does all JS crash or just sometimes? | ||||||||||||||||||||||||||||||||||||||||
| Comment by Tim Stewart [ 26/Sep/11 ] | ||||||||||||||||||||||||||||||||||||||||
|
Oh, and /usr/local/lib/libjs.so above is Spidermonkey 1.7.0 from FreeBSD ports. | ||||||||||||||||||||||||||||||||||||||||
| Comment by Tim Stewart [ 26/Sep/11 ] | ||||||||||||||||||||||||||||||||||||||||
|
Hello, I work with Brett K. I did some digging to figure out why we had no backtrace. It appears that FreeBSD's libexecinfo port is returning 0 frames for the backtrace. So, no output above. I reproduced the error within GDB and got the following output (look for "info frame", "info threads", and "bt"):
It appears the crash itself is in JS_DHashTableFinish inside of libjs.so. Does this ring any bells? I can provide more output from GDB if necessary. | ||||||||||||||||||||||||||||||||||||||||
| Comment by Brett Kiefer [ 26/Sep/11 ] | ||||||||||||||||||||||||||||||||||||||||
|
Okay, we rebuilt in devel mode, but we're still not getting stack traces. Same as before, blank backtraces. Any idea how we can get a stack trace? Thu Sep 22 11:56:55 Backtrace: Thu Sep 22 11:56:55 dbexit: Thu Sep 22 11:56:55 Got signal: 11 (Segmentation fault: 11). Thu Sep 22 11:56:55 Backtrace: Thu Sep 22 11:56:55 dbexit: ; exiting immediately Thu Sep 22 11:56:55 Got signal: 11 (Segmentation fault: 11). Thu Sep 22 11:56:55 Backtrace: Thu Sep 22 11:56:55 Invalid access at address: 0x86bdb2246 Thu Sep 22 11:56:55 Got signal: 11 (Segmentation fault: 11). Thu Sep 22 11:56:55 Backtrace: Thu Sep 22 11:56:55 Invalid access at address: 0x86bdb2246 Thu Sep 22 11:56:55 Got signal: 11 (Segmentation fault: 11). Thu Sep 22 11:56:55 Backtrace: Thu Sep 22 11:56:55 closeAllFiles() finished Thu Sep 22 11:56:55 Got signal: 11 (Segmentation fault: 11). Thu Sep 22 11:56:55 Backtrace: Thu Sep 22 11:56:55 ERROR: Client::~Client _context should be null but is not; client:conn | ||||||||||||||||||||||||||||||||||||||||
| Comment by Brett Kiefer [ 26/Sep/11 ] | ||||||||||||||||||||||||||||||||||||||||
|
We can try to repro the issue on Linux, but I don't know that it would tell us anything new about the problem - if it did seg fault, we'd be in the same place, and if it didn't, we'd have to figure that memory allocation differences on Linux are saving it. Or are you saying that right now it is generally foolish to run MongoDB on FreeBSD in production? We had another Trello outage this morning because MongoDB was not responding to queries, even though the service was up - we're getting the specifics right now. Restarting Mongo fixed it, but if MongoDB on FreeBSD just isn't stable yet, we will certainly think about moving. | ||||||||||||||||||||||||||||||||||||||||
| Comment by Eliot Horowitz (Inactive) [ 23/Sep/11 ] | ||||||||||||||||||||||||||||||||||||||||
|
we've seen various odd things with freebsd to date and is not a platform we fully test on yet. |