[SERVER-5302] Mongos Process Dying Signal 11 Created: 14/Mar/12 Updated: 15/Aug/12 Resolved: 20/Mar/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | a Rob | Assignee: | Randolph Tan |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
EC2 Linux 2.6.21.7-2.fc8xen #1 SMP Fri Feb 15 12:34:28 EST 2008 x86_64 x86_64 x86_64 GNU/Linux, running paster servers, pylons web framework, nginx as a reverse proxy |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Operating System: | ALL | ||||||||
| Participants: | |||||||||
| Description |
|
Mongos Process constantly dies due to unknown cause. Usually occurs when system is under load. Log files show a SIGSEGV (Signal 11) Possibly Related to |
| Comments |
| Comment by Ian Whalen (Inactive) [ 30/Jul/12 ] |
|
if that's the case please open a new server ticket with description and relevant details. |
| Comment by Travis Reeder [ 30/Jul/12 ] |
|
Ok, thanks Ian. We're having bad problems with mongos failing right now under heavy load with 2.0.6. |
| Comment by Ian Whalen (Inactive) [ 30/Jul/12 ] |
|
@travis, it did - you can check |
| Comment by Travis Reeder [ 30/Jul/12 ] |
|
Did this fix make it into 2.0.5 / 2.0.6? |
| Comment by Eliot Horowitz (Inactive) [ 20/Mar/12 ] |
|
actual bug is |
| Comment by Andrew Levy [ 20/Mar/12 ] |
|
I just want to emphasize how critical this issue is – we can't maintain our infrastructure with our mongos instances dying. It puts more stress on our other application servers which eventually die as the load becomes too much to handle. Do you have any estimate on a fix? Thanks! |
| Comment by Randolph Tan [ 19/Mar/12 ] |
|
It looks like this is related to https://jira.mongodb.org/browse/SERVER-5110. |
| Comment by a Rob [ 19/Mar/12 ] |
|
yes |
| Comment by Randolph Tan [ 19/Mar/12 ] |
|
Sorry for being unclear, what I meant to ask was were you also seeing the "got not master" in the other crashes? |
| Comment by a Rob [ 19/Mar/12 ] |
|
Yes, right before the signal 11, there is a: We've experienced this crash many times. We're also not using a binary we've built - we just renamed it mongo32. |
| Comment by Randolph Tan [ 19/Mar/12 ] |
|
Hi, Were you able to experience this kind of crash more than once? If yes, do you also see a "got not master" in the logs just before the crash? |
| Comment by a Rob [ 19/Mar/12 ] |
|
| Comment by Randolph Tan [ 17/Mar/12 ] |
|
It looks like you built your own mongos binary, is that correct? Can you try running this on that binary? addr2line -fC -e mongos 0x225420 0x8366ae2 0x840682b 0x8406fe7 0x841aa13 0x8223df9 0x367762 0x44dd7e Thanks! |
| Comment by a Rob [ 15/Mar/12 ] |
|
Sorry, we're running both the 32bit binary on a 32bit system and the 64bit binary on a 64bit system. The attached logs are from a 32bit system: System: Binary: |
| Comment by Randolph Tan [ 15/Mar/12 ] |
|
Hi, can you provide us with the exact OS and version of the mongos you were using for the attached log. Specifically, are you using the rc releases? You listed the environment as 64bit linux, it appears that you are using the 32bit binary. Is that correct? |